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Preface 


This textbook is a comprehensive united 
course in linear algebra and analytic geometry based on lectures 
read by the author for many years at various institutes to future 
specialists in computational mathematics. 

It is intended mainly for those in whose education computational 
mathematics is to occupy a substantial place. Much of the instruc¬ 
tion in this speciality is connected with the traditional mathemat¬ 
ical courses. Nevertheless the interests of computational mathemat¬ 
ics make it necessary to introduce large enough changes in both 
the methods of presentation of these courses and their content. 

There are a lot of good textbooks of linear algebra and analytic 
geometry, including English textbooks. Their direct use for training 
specialists in computational mathematics proves difficult, however. 
To our mind, this is mainly due to computers to be requiring much 
more facts from linear algebra than is usually given in the available 
books. And the necessary additional facts fail as a rule to appear 
in the conventional texts of linear algebra and are instead contained 
in either papers scattered in journals or books relating to other 
branches of mathematics. 

Computer students begin to get familiar with linear algebra and 
analytic geometry sufficiently early. At this same time their scientif¬ 
ic world outlook begins to shape. Therefore what is read in this 
course and how it is read determine to a great extent the future 
perception of entire computational mathematics by the students. 
Of course computer students must get a systematic and rigorous 
presentation of all the fundamentals of algebra and geometry. But 
they must be made familiar as early as possible at least briefly 
with those problems and methods which computational algebra 
has accumulated. 

Introduction to problems of computations allows the lecture 
course to be effectively accentuated in the interests of computational 
mathematics and a close relation to be established between theory 
and numerical methods in linear algebra. The basic material for 
this is provided by the simplest facts pertaining to such topics 
as round-off errors, perturbation instability of many basic notions 
of linear algebra, stability of orthonormal systems, metric and 



■10 


Preface 


normed spaces, singular decomposition, bilinear forms and their 
relation to computational processes, and so on. 

Of course incorporation of novel and sufficiently large material 
is impossible without substantial restructuring of the traditional 
course. This book is an attempt at such a restructuring. 

V. V. Voyevodin 

Way, 6 1982 



PART I 


Vector Spaces 


CHAPTER 1 

Sets, Elements, Operations 


1. Sets and elements 

In all areas of activity we have continually 
to deal with various collections of objects united by some common 
feature. 

Thus, studying the design of some mechanism we may consider 
the totality of its parts. An individual object of the collection 
may be any of its parts, the feature uniting all the objects being 
the fact that they all belong to a quite definite mechanism. 

Speaking of the collection of points of a circle in the plane we 
actually speak of objects, points of a plane, that are united by the 
property that they are all the same distance from some fixed point. 

It is customary in mathematics to call a collection of objects 
united by some common feature a set and to term the objects ele¬ 
ments of the set. It is impossible to give the notion of set a rigorous 
definition. We may of course say (as we did!) that the set is a “col¬ 
lection", a “system”, a “class” and so on. It looks, however, very 
much like a formal utilization of the rich vocabulary of our language. 

To define a concept it is first of all necessary to point out how it 
is related to more general notions. For the concept of set this cannot 
be done, since there is no more general concept for it in mathemat¬ 
ics. Instead of defining it we are compelled to resort to illustra¬ 
tions. 

One of the simplest ways of describing a set is to give a complete 
list of elements constituting the set. For example, the set of all 
books available to a reader of a library is completely defined by 
their lists in the library catalogues, the set of all prices of goods 
is completely defined by a price-list and so on. However, this method 
applies only to finite sets, i.e. such that contain a finite number of 
elements. But infinite sets, i.e. sets containing infinitely many 
elements, cannot be defined with the aid of a list. How, for example, 
can we compile a list of all real numbers? 

When it is impossible or inconvenient to give a set by using 
a list, it can be given by pointing out a characteristic property, 
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i.e. a property possessed only by the elements of the set. In problems 
in defining loci, for example, a characteristic property of the set of 
points which is the solution of the problem is nothing than the 
collection of conditions these points must satisfy according to the 
requirements of the problem. 

The description of a set may be very simple and cause no dif¬ 
ficulties. For example, if we take a set consisting of two numbers, 
1 and 2, then clearly neither the number 3 nor a notebook or a car 
will be contained in that set. But in the general case, giving sets 
by their characteristic properties sometimes results in compli¬ 
cations. The reasons for these are rather numerous. 

One reason seems to be the insufficient definiteness of the con¬ 
cepts used to describe sets. Suppose we are considering the set of 
all planets of the solar system. What is the question? There are 
nine major planets known. But there are over a thousand minor 
planets, or asteroids, turning round the sun. The diameters of some 
measure hundreds of kilometres, but there are also such whose 
diameter is under one kilometre. As methods of observation improve, 
smaller and smaller planets will be discovered, and finally the 
question will arise as to where the minor planets end and the mete¬ 
orites and solar dust begin. 

These are not the only difficulties with the definition of the struc¬ 
ture of a set. Sometimes 9ets, quite well defined at first sight, turn 
out to be very poorly defined, if defined at all. Suppose, for example, 
some set consists of one number. Let us define that number as the 
smallest integer not definable in under a hundred words. Assume that 
only words taken from some dictionary and their grammatical 
forms are used and that the dictionary contains such words as “one”, 
“two” and so on. 

Notice that on the one hand such an integer must not exist for 
it is defined in under a hundred words, italicized above, and accord¬ 
ing to the meaning of the words it cannot be defined in such a way. 
But on the other hand since the number of the words used in the 
language is finite this means that there are integers that cannot be 
defined in under a hundred words and hence there is a smallest 
one among these integers. 

The area of mathematics called set theory has accumulated many 
examples where the definition of the set is intrinsically contradictory. 
The study of the question under what conditions this may happen 
has led to deep investigations in logic, we shall lay them aside, 
however. Throughout the following it will be assumed that we are 
considering only the sets that are defined precisely and without 
contradictions and have a structure that raises no doubts. 

As a rule, we shall denote sets by capital Latin letters A, R. ... 
and elements of sets by small letters a , b, ... . We shall write 
x 6 A if an element x is in a set A and x g A if it is not. 
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We shall sometimes introduce into consideration the so-called 
empty set , or null set, i.e. the set that contains no elements. It is 
convenient to use this set where it is not known in advance whether 
there is at least one element in the collection under consideration. 

Exercises 

1. Construct finite and infinite sets. What properties 
are characteristic of them? 

2. Construct sets whose descriptions contain a contradiction. 

3. Is the set of real roots of the polynomial x* + Ax* + lx* + Ax + 1 
empty? 

4. Construct sets whose elements are sets. 

5. Construct sets that contain themselves as an element. 

2. Algebraic operation 

Among all possible sets there are such on 
whose elements some operations are allowed to be performed. Suppose 
we are considering the set of all real numbers. Then for each of its 
elements such operations as calculation of the absolute value of 
that element and calculation of the sine of that element can be 
defined, and for every pair of elements addition and multiplication 
can be defined. 

In the above example, note especially the following features of 
the operations. One is the definiteness of all operations for any 
element of the given set, another is the uniqueness of all operations 
and the final feature is that the result of any operation belongs 
to the elements of the same set. Such a situation takes place by far 
not always. 

An operation may be defined not for all elements of a set; for 
example, calculation of logarithms is not defined for negative num¬ 
bers. Taking the square root of positive numbers is defined, but 
not uniquely. However, even if an operation is uniquely defined 
for every element, its result may not be an element of the given set. 
Consider division on the set of positive integers. It is clear that for 
any two numbers of the set division is realizable but its result is 
not necessarily an integer. 

Let A be a set containing at least one element. We shall say that 
an algebraic operation is defined in A if a law is indicated by which 
any pair of elements, a and b, taken from A in a definite order, is 
uniquely assigned a third element, c, also from that set. 

This operation may be called addition, and c will then be called 
the sum of a and b and designated c = a + b\ it may be called 
multiplication, and c will then be called the product of a and b 
and designated c = ab. 

In general terminology and notation for an operation defined 
in A will not play any significant part in what follows. As a rule, 
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we shall employ the notation of the sum and product regardless of 
the way the operation is in fact defined. But if it is necessary to 
emphasize some general properties of the algebraic operation, 
then it will be designated *. 

Consider some simple examples to see what features an algebraic 
operation may have. Let A be the set of all positive rational num¬ 
bers. We introduce for the elements of A the usual multiplication 
and division of numbers and use the conventional notation. It is 
not hard to check that both operations on A are algebraic. But while 
for multiplication ab = ba for every element of A , i.e. the order 
of the elements is immaterial, for division, on the contrary, it 
is very essential, for the equation a : b = b : a is possible only 
if a = b. Thus, although the algebraic operation is defined for an 
ordered pair of elements the ordering of the elements is inessential. 

An algebraic operation is said to be commutative if its result 
is independent of the order of choosing the elements, i.e. for any 
two elements a and b of a given set a m b = b ma. It is obvious that 
of the conventional arithmetical operations on numbers addition 
and multiplication are commutative and subtraction and division 
are noncommutative. 

Suppose now that three arbitrary elements, a, b and c, are taken. 
Then the question naturally arises as to what meaning should be 
given to the expression a mb me. How can the algebraic operation 
defined for two elements be applied to three? 

Since we can apply an algebraic operation only to a pair of ele¬ 
ments, the expression a mb me may be given a definite meaning by 
bracketing either the first two or the last two elements. In the first 
case the expression becomes (a m b) m c and in the second we get 
a m (b m c). Consider the elements d = a mb and e = b me. Since 
they are members of the original set, (a mb) me and am (b m c) may 
be considered as the result of applying the algebraic operation 
to the elements d, c and a, e, respectively. 

In general, the elements d me and a m e may turn out to be different. 
Consider again the set of positive rational numbers with the algeb¬ 
raic operation of division. It is easy to see that as a rule (a : b) : c 
= 5 ^ a : (b : c). For example, ((3/2) : 3) : (3/4) = 2/3, but 
(3/2) : (3 : (3/4)) = 3/8. 

An algebraic operation is said to be associative if for any three 
elements, a, b and c, of the original set am(bmc) = (a mb) me. 

The associativity of an operation allows one to speak of a uniquely 
defined result of applying an algebraic operation to any three ele¬ 
ments, a, b and c, meaning by it any of the equivalent expressions 
a m(b m c) and (a mb) m c, and write amb me without brackets. 

In the case of the associative operation one can also speak of the 
uniqueness of the expression * a 2 * ... m a n containing any 
finite number of elements a,, a 2 , . . ., a„. By a,* a,* ... m a n 
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we shall mean the following. Arrange brackets in this expression 
in an arbitrary way, if only it may be defined by successive appli¬ 
cation of an algebraic operation to pairs of elements. For example, 
for the five elements a lt a 2 , a 3 , a k and a 5 the brackets may be ar¬ 
ranged either like this: a t * ((a 2 * a 3 ) * (a 4 * a 5 )) or like this: 
((fflj * a 2 ) * a 3 ) * (a 4 * a s ) or in many other ways. 

We prove that for an associative operation the result of a calcula¬ 
tion is independent of bracket arrangement. Indeed, for n = 3 
this assertion follows from the definition of an associative opera¬ 
tion. Therefore we set n > 3 and assume that for all numbers less 
than n our assertion is already proved. 

Let elements a Xl a 2 , . . ., a n be given and suppose that brackets 
indicating the order in which to perform the operation are arranged 
in some way. Notice that the final step is always to perform the 
operation on the two elements a x * a 2 + . . . * a h and a fc+1 * a h+2 «... 
. . . *a n for some k satisfying the condition 1 ^ k ^ n — 1. Since 
both expressions contain fewer than n elements, by assumption 
they are uniquely defined and it remains for us to prove that for 
any positive integers k, l, l 1, 

(®1 ... * fl*) * (3fc+l * a h+i * ... * An) 


— ( a X * a t * • • • * a k+l) * (®fc+I+X * a h+l+t * • • • *®n)' 


Letting 


a i * °* ♦ • • • * “ b, 

ffl h+l * a h+ * * • • • * a k+ l — c j 
ffl h+I + l * a h+l+i * • • • * a n ~ d, 

we get, on the basis of the associativity of the operation, 
b * (e * d) = (6 * c) * d, 
and our assertion is proved. 

If an operation is commutative as well as associative, then the 
expression a^a,,* ... *a„ is independent of the order of its 
elements. It is left as an exercise for the reader to prove that this 
assertion is true. 

It should not be supposed that the commutativity and associativi¬ 
ty of an operation are in some way related to each other. It is pos¬ 
sible to construct operations with very different combinations of 
these properties. We have already seen from the examples of multi¬ 
plication and division of numbers that an operation may be commuta¬ 
tive and associative or noncommutative and nonassociative. Consider 
two more examples. Let a set consist of three elements, a, b, and e. 
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Give algebraic operations by these tables: 



( 2 . 1 ) 


and let the first element be always chosen from the column and the 
second from the row, and let the result of the operation be taken 
at the intersection of the corresponding row and column. In the 
first case the operation is obviously commutative but not associa¬ 
tive, since, for example, 

(a * b) * c = c * c = c, 
a * (b * c) = a * a = a. 

In the second case the operation is not commutative, but associa¬ 
tive, which is easy to show by a straightforward check. 


Exercises 

1. Is the operation of calculating tan x on the set of 
all real numbers x algebraic? 

2. Consider the set of real numbers x satisfying the inequality | x i < 1. 
Are the operations of multiplication, addition, division and subtraction algebra¬ 
ic on this set? 

3. Is the algebraic operation x * y = x* + y commutative and associative 
on the set of all real numbers x and j/? 

4. Let a set consist of a single element. How can the algebraic operation be 
defined on that set? 

5. Construct algebraic operations on a set whose elements are sets. Are 
these operations commutative, associative? 


3. Inverse operation 

Let A be a set with some algebraic operation. 
As we know, it assigns to any two elements a and b of A a third 
element c = a *]b. Consider the collection C of the elements of A 
that can be represented as the result of the given algebraic operation. 
It is clear that regardless of the algebraic operation all the elements 
of C are at the same time the elements of A. It is quite optional, 
however, for all the elements of A to be in C. 

Indeed, fix in A some element / and assign it to any pair of ele¬ 
ments a and b of A. It is obvious that the resulting correspondence 
is an algebraic operation, commutative and associative. The set C 
will contain only one element / regardless of the number of ele¬ 
ments in A. 
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Exactly what elements of A are in C depends on the algebraic 
operation. Let it be such that C coincides with A, i.e. let both sets 
contain the same elements. Then each element of A can be represent¬ 
ed as the result of the given algebraic operation on some two ele¬ 
ments of the same set A. Of course, such a representation may be 
nonunique. Nevertheless we conclude that each element of A may 
be assigned definite pairs of elements of A. 

Thus, the original algebraic operation generates on A some other 
operation. This may not be unique, since one element may be as¬ 
signed to more than one pair. But even if it is unique, it will not be 
algebraic since it is defined not for any pair of elements, but for 
only one element, although this may be arbitrary. In respect to the 
given algebraic operation it would be natural to call this new opera¬ 
tion the “inverse” one. In fact, however, by the inverse operation we 
shall mean something different, something closer to the concept of 
algebraic operation. 

Notice that investigating the “inverse” operation is equivalent 
to investigating those elements u and v which satisfy the equation 

u *v = 6 (3.1) 

for different elements 6. Investigation of this equation for the two 
elements u and v is easy to reduce to investigation of two equations 
for one element. To do this it suffices to fix one of them and to deter¬ 
mine the other from equation (3.1). So investigating the “inverse” 
operation is mathematically equivalent to solving the equations 

a * x = 6 , y*a = b (3.2) 

for the elements x and y of A, with diSerent elements a and b of 
A fixed. 

Suppose (3.2) have unique solutions for any a and b. Then each 
ordered pair of elements a and b of A can be assigned uniquely 
defined elements x and y of A, i.e. two algebraic operations can be 
introduced. These are called respectively the right and the left 
inverse of the basic operation. If they exist, we shall say that the 
basic operation has an inverse. Note that the above example shows 
that an algebraic operation, even commutative and associative one, 
may lack both the right and the left inverse. 

The existence of an inverse implies in fact the existence of two, 
in general diSerent, algebraic operations, the right and the left 
inverse. We are compelled therefore to speak of diSerent elements 
x and y. If, however, the algebraic operation is commutative and 
has an inverse, then obviously x = y and the right inverse coincides 
with the left inverse. 

Consider some examples. Let A be the real axis with the usual 
multiplication of numbers as algebraic operation. This has no 
inverse on the given set, since when, for example, a — 0 and 6 = 1, 
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equations (3.2) cannot hold for any numbers x and y. But if we 
consider the operation of multiplication given only on the set of 
positive numbers, then the operation will now have an inverse. 

Indeed, for any positive numbers a and b there are unique positive 
numbers x and y satisfying equations (3.2). The inverse in this 
case is nothing but division of numbers. The fact that in reality 
x = y is of no interest to us now. 

The operation of addition has no inverse if it is given on the set 
of positive numbers, since equations (3.2), for example, can hold 
for no positive x and y if a = 2 and 6 = 1. But if the operation of 
addition is given on the entire real axis, then its inverse exists and 
is nothing but subtraction of numbers. 

The example of addition and multiplication shows that a direct 
operation and its inverse may have quite different properties. From 
the associativity and commutativity of an algebraic operation 
need not necessarily follow the associativity or commutativity 
of its inverse, even if the inverse exists. Moreover, as already noted 
above, a commutative and associative algebraic operation may 
simply have neither the right nor the left inverse. 

These simple examples show yet another important fact. Consider 
again multiplication on the set of positive numbers. Its right and 
left inverse coincide for this operation and are division of numbers. 
At first it may seem that now for division of numbers the inverse 
is multiplication of numbers. This is not quite so, however. 

Indeed, write the corresponding equations (3.2) 

a : x = 6, y : a = b. 

It is then obvious that 

x — a : b, y — a-b. 

Consequently, the right inverse of division is again division, and 
the left inverse is multiplication. Thus, the inverse of an inverse 
does not necessarily coincide with the original algebraic operation. 


Exercises 

1. Are there right and left inverses of the algebraic 
operations given by tables (2.1)? 

2. What are the right and left inverse of the algebraic operation x»y = xV 
defined on the set of positive numbers x and y? 

3. Prove that if the right and the left inverse coincide, then the original 
algebraic operation is commutative. 

4. Prove that if an algebraic operation has an inverse, then the right and 
the left inverse have inverses too. What are these? 

5. Construct an algebraic operation for which all four inverses of the inverse 
operations coincide with the original operation. 
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4. Equivalence relation 

Notice that in discussing above the properties 
of the algebraic operation we implicitly assumed the possibility 
of checking any two elements of a set for coincidence or noncoinci¬ 
dence. Moreover, we treated coinciding elements rather freely 
never making any distinction between them. We did not assume 
anywhere that the coinciding elements were indeed one element 
rather than different objects. But actually we only used the fact 
that some group of elements, which we called equal, are the same 
in certain contexts. 

This situation occurs fairly often. Investigating general proper¬ 
ties of similar triangles we in fact make no distinction between any 
triangles having the same angles. In terms of the properties preserved 
under a similarity transformation, these triangles are indistinguisha¬ 
ble and could be called “equal”. Investigating the criteria of the 
equality of triangles we make no difference between the triangles 
that are situated in different places of the plane but can be made 
to coincide if displaced. 

In many different problems we shall be faced with the necessity 
of partitioning one set or another into groups of elements united 
according to some criterion. If none of the elements is in two differ¬ 
ent groups, then we shall say that the set is partitioned into disjoint, 
or nonoverlapping, groups or classes. 

Although the criteria according to which the elements of a set 
are partitioned into classes may be very different, they are not 
entirely arbitrary. Suppose, for example, that we want to divide 
into classes all real numbers, including numbers a and b into the 
same class if and only if a. Then no number a can be in the 
same class with itself, since a is not greater than a itself. Conse¬ 
quently, no partitioning into classes according to this criterion 
is possible. 

Let some criterion be given. We assume that with regard to any 
pair of elements a and b of a set A it can be said that either a is 
related to b by the given criterion or not. If a is related to b, then 
we shall write a ~ b and say that a is equivalent to b. 

Even the analysis of the simplest examples suggests the condi¬ 
tions a criterion must satisfy for partitioning a set A into classes 
according to it to be possible. Namely: 

1. ReDexivity: a ~ a for all a £ A. 

2. Symmetry: if a ~ b, then b ~ a. 

3. Transitivity: if a ~ b and b ~ c, then a ~ c. 

A criterion satisfying these conditions is called an equivalence 
relation. 

We prove that any equivalence relation partitions a set into 
classes. Indeed, let K a be a group of elements of A equivalent to 
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a fixed element a. By reflexivity a £ K a . We show that two groups 
K a and K b either coincide or have no elements in common. 

Let some element c be in K a and K b , i.e. let c ~ a and c ~ b. 
By symmetry a ~ c and by transitivity a ~ b and, of course, 
b ~ a. If now x £ K a , then x ~ a and hence x ~ b, i.e. x £ K b . 
Similarly, if x £ K b , then it follows that x £ K a . Thus, two groups 
having at least one element in common completely coincide and 
we have indeed obtained a partition of the set A into classes. 

Any two elements may be equivalent or nonequivalent in terms 
of the criterion in question. Nothing will happen if we call equiv¬ 
alent elements equal (with respect to a given criterion!) and non¬ 
equivalent elements unequal (with respect to the same criterion!). 

It may seem that in doing so we ignore the meaning of the word 
“equal”, for now elements equal according to one criterion may prove 
unequal according to another. There is nothing unnatural to it, 
however. In every particular problem we distinguish the elements 
or do not only in relation to their properties that are of interest 
to us in this particular problem, and in different problems we may 
be concerned with different properties of the same elements. 

It will be assumed in what follows that, whenever necessary, 
for the elements of the set an equality criterion must be defined 
saying that an element a is or is not equal to an element b. If a 
is equal to b, then we shall write a ~ b, and a =£ b otherwise. 
It will also be assumed that the equality criterion is an equivalence 
relation. The reflexivity, symmetry and transitivity conditions 
may be regarded as reflecting the most general properties of the 
usual equality relation of numbers. 

The equality relation allows us to partition an entire set into 
classes of elements which we have decided for some reasons to con¬ 
sider equal. This means that the difference between the elements 
of the same class is of no importance to us. Consequently, in all 
situations to be considered in what follows the elements called 
equal must exhibit sameness. 

If the equality relation is introduced axiomatically , i.e. without 
reference to the particular nature of the elements, it will be agreed 
to assume that the equality sign merely implies that the elements 
oh its sides simply coincide, that is that it is the same element. 
When the equality sign is used in this way, the properties of reflex¬ 
ivity, symmetry and transitivity require no particular convention. 
Partitioning a set into classes of equal elements will make each 
class consist of only one element. 

Where the equality relation is introduced relying on a particular 
nature of elements it may happen that some or all classes of equal 
elements will consist of more than one element. This makes us 
impose additional requirements on the operations on elements to be 
introduced. 
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Indeed, as we have agreed, equal elements must exhibit sameness. 
Therefore every operation to be introduced must now necessarily 
give equal results when applied to equal elements. In fact we shall 
never verify this requirement, and it will be left for the reader 
to see for himself that the given property holds for the operations 
to be introduced. 


Exercises 

1. Is it possible to divide all the countries of the world 
into classes, placing two countries in the same class ii and only if they have 
a common border? If not, why? 

2. Consider a set of cities with motorway communication. Say that two 
cities A and B are connected if one can go from A to B by motorway. Can the 
cities be divided into classes according to this criterion? If they can, what are 
the classes? 

3. Say that two complex numbers a and b are equal in absolute value if 
i a | = | b |. Is this criterion an equivalence relation? What is this partition 
into classes? 

4. Consider the algebraic operations of addition and multiplication of com¬ 
plex numbers. How do they act on classes of elements equal in absolute value? 

5. Construct algebraic operations on the set defined in Exercise 2. How do 
they act on the classes? 


5. Directed line segments 

The foregoing examples may give an impres¬ 
sion that all talk about operations on elements of sets concerns 
only operations on various number sets. It is not so, however. 
In what follows we construct many examples of other kinds of sets 
with operations, but for the present consider just one example which 
will be constantly referred to throughout the course. 

The most fundamental concepts of physics are such notions as 
force, displacement, velocity, acceleration. They are all character¬ 
ized not only by a number giving their magnitude but also by some 
direction. We now construct a geometrical analogue of such no¬ 
tions. 

Let A and B be two distinct points in space. On the straight 
line through them they define in a natural way some line segment. 
It is assumed that the points are always given in a definite order, 
for example, first A is given and then B is. Now we can state a direc¬ 
tion on the constructed line segment, namely, the direction from 
the first point A to the second point B. 

A line segment together with the direction stated on it is called 
a directed line segment with initial point A and terminal point B. 
It will be otherwise termed a vector, and A will be called the point 
of application of the vector. A vector with point of application A 
will be said to be fixed at A. 
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For directed line segments or vectors double notation will be 
used. If it must be stressed that a directed line segment with initial 

point A and terminal point B is meant, we shall write AB. But 
if we do not care exactly what points of the directed line segment 
are limiting points, then we shall use some simpler notation, small 
Latin letters, for example. In drawings directed line segments will 
be denoted by arrows, with arrowheads always at the terminal 
point of the line segment. 

In a directed line segment it is essential which of the limiting 


points is initial and which is terminal. Directed line segments AB 

and BA will therefore be considered 
different. 

So we can construct different sets 
whose elements are directed line 
segments. Before introducing opera¬ 
tions on elements we define what 
directed line segments will be consid¬ 
ered equal. 

Consider first the (parallel ) trans¬ 
lation of a directed line segment 

AB to a point C. Let C be oS 
the straight line through A and B 
(Fig. 5.1). Draw the straight line through A and C, then the straight 
line through C parallel to AB, and finally the straight line through B 
parallel to AC. Denote the point of the intersection of the last two 



lines by D. The directed line segment CD will be precisely the 
result^of the translation of AB to C. But if C is on the straight line 
through A and B, then the directed line segment CD is obtained 


by shifting the directed line segment AB along the straight line 
containing it until the point A coincides with C. 

Now we can give a definition of the equality of vectors. Two 
vectors are said to be equal if they can be made to coincide under 
a translation. It is not hard to see that this definition of equality 
is an equivalence relation, i.e. possesses the properties of reflexivity, 
symmetry and transitivity. 

Thus the collection of all vectors can be broken down in a natural 
way into classes of equal vectors. It is simply sufficient to describe 
each of the classes. It is obtained by translating any of the vectors 
of a class to each point of space. 

Notice that there is one and only one vector, in every class of 
equal vectors, fixed at any point of space. In comparing vectors 
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a and b therefore we can use the following device. Fix some point 
and translate to it the vectors a and b. If they completely coincide, 
then a = b, and if not, a =£ b. 

Besides the set consisting of all vectors of a space we shall often 
deal with other sets. These will mainly be sets of vectors either 
parallel to some straight line or lying on it or parallel to some plane 
or lying in it. Such vectors will be called respectively collinear and 
coplanar. Of course, on the sets of collinear and coplanar vectors 
the above definition of the equality of vectors is preserved. 

We shall also consider the so-called zero directed line segments 
whose initial and terminal points coincide. The direction of zero 
vectors is not defined and they are all considered equal by defini¬ 
tion. If it is not necessary to specify the limiting points of a zero 
vector, then we shall denote that vector by 0. 

Also it will be assumed by definition that any zero vector is 
parallel to any straight line and any plane. Throughout the following 
therefore, unless otherwise specified, the set of vectors of a space, 
as well as any set of collinear or coplanar vectors, will be assumed 
to include the set of all zero vectors. This should not be forgotten. 

Exercises 

1. Prove that the nonzero vectors of a space can be 
partitioned into classes of nonzero collinear vectors. 

2. Prove that any class of nonzero collinear vectors can be partitioned into 
classes of nonzero equal vectors. 

3. Prove that any class of nonzero equal vectors is entirely in one and only 
one class of nonzero collinear vectors. 

4. Can the nonzero vectors of a space be partitioned into classes of coplanar 
vectors? If not, why? 

5. Prove that any set of nonzero coplanar vectors can be partitioned into 
classes of nonzero collinear vectors. 

6. Prove that any pair of diSerent classes of nonzero collinear vectors is 
entirely in one and only one set of nonzero coplanar vectors. 

6. Addition 

of directed line segments 

As already noted, force, displacement, veloc¬ 
ity and acceleration are the originals of the directed line segments 
we have constructed. If these line segments are to be useful in solv¬ 
ing various physical problems, we must take into account the cor¬ 
responding physical analogies when introducing operations. 

Well known is the operation of addition of forces performed by 
the so-called parallelogram law. The same law is used to add dis¬ 
placements, velocities and accelerations. According to the introduced 
terminology this operation is algebraic, commutative and associa¬ 
tive. Our immediate task is to construct a similar operation on 
directed line segments. 
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The operation of vector addition is defined as follows. Suppose 
it is necessary to add vectors a and b. Translate the vector b to 
the terminal point of a (Fig. 6.1). Then the sum a -f b is the vector 
whose initial point coincides with the initial point of a and whose 

terminal point coincides with the ter¬ 
minal point of b. This rule is usually 
called the triangle law. 

It is obvious that vector addition is 
an algebraic operation. We shall prove 
that it is commutative and associative. 

To establish the commutativity of 
addition suppose first that a and b are 
not collinear. Apply them to a common origin O (Fig. 6.2). Denote 
by A and B the terminal points of a and b respectively and 
consider the parallelogram OBCA. It follows from the definition 
of the equality of vectors that 



BC = OA = a, AC — OB = b. 

But then the same diagonal OC of the parallelogram OBCA is simulta¬ 
neously a + b and b + a. The eollinearity of a and b is obvious. 




Notice that incidentally we have obtained another way of con¬ 
structing a vector sum. Namely, if on vectors a and b fixed at one 
point we construct a parallelogram, then its diagonal fixed at the 
same point will be the sum a + b. 

To prove the associativity of addition, apply a to an arbitrary 
point O, b to the terminal point of a, and c to the terminal point 
of b (Fig. 6.3). Denote by A, B and C the terminal points of a, b 
and c. Then 


(a -f b) + c = (OA + AB) -f BC = OB + BC = OC, 
a + {b + c) = OA + (AB + ~BC) = OA + AC = OC. 
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From the transitivity of the equality relation of vectors we conclude 
that the operation is also associative. 

These properties of vector addition allow' us to calculate the sum 
of any number of vectors. If w'e apply a vector a 2 to the terminal 
point of a lt a 3 to the terminal point of a 2 and so on, and finally 
a vector a n to the terminal point of a n _ x , then the sum a x + a 2 + ... 
. . . + On will be a vector whose initial point coincides w'ith the 
initial point of a x and w'hose terminal point coincides w'ith the 
terminal point of a^. This rule of constructing a vector sum is called 
the polygon law. 

We now' discuss the existence of an inverse for vector addition. 
As is know'n, to answ'er this question it is necessary to investigate 
the existence and uniqueness of the solution of the equations 

a + x = b, y + a = b 

for arbitrary vectors a and b. By virtue of the commutativity of 
the basic operation it is obvious that it suffices to examine only 
one of the equations. 

Take an arbitrary directed line segment AB. Using an elementary 
geometric construction w-e establish that always 


AB^BA^O, AB-tO = AB. 
Therefore the equation 

AB -j-r = CD 


( 6 . 1 ) 

( 6 . 2 ) 


for any vectors AB and CD will clearly have at least one solution, 
for example, 

x=BA + CD. (6.3) 

Suppose (6.2) holds for some other vector z as well, i.e. 


AB 4-x = CD, AB -j - z — CD. 


Then adding BA to both sides of these equations we get, in view 

of (6.1), x = BA + CD, z = BA + CD and hence x — z. 

Thus, the operation of vector addition has an inverse. It is vector 
subtraction. If for vectors a, b and c w'e have a + c = b, then we 
write in symbols c = b — a. The vector b — a uniquely determined 
by the vectors a and b is called the vector difference. The justification 
of this notation will be given somewhat later. 

It is easy to show' a rule for constructing the difference of tw'o 
given vectors a and b. Apply these to a common point and construct 
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a parallelogram on them (Fig. 6.4). We have already shown above 
that one of the parallelogram diagonals is the sum of the given 
vectors. The other diagonal is easily seen to be the difference of the 
same vectors. This rule of constructing the sum and the difference 
of vectors is usually called the parallelogram law. 

Notice that we could define addition not for the set of all vectors 
of a space but only for one of the sets of collinear or coplanar vectors. 

The sum of two vectors of any such set 
will again be in the same set. The opera¬ 
tion of vector addition therefore remains 
algebraic in this case too. Moreover, it 
preserves all its properties and, what 
is especially important, it has as before 
an inverse. The validity of the last 
assertion follows from formula (6.3). If 

vectors AB and CD are parallel to some 
then it is obvious that so is the vector 

BA + CD or equivalently the difference vector CD — AB. 

Thus, the operation of vector addition is algebraic, commutative 
and associative, and has an inverse on the sets of three types: on the 
set of vectors of a space, on the set of collinear vectors and on the 
set of coplanar vectors. 



Fig. 6.4 

straight line or plane, 


Exercises 

1. Three forces equal in magnitude and directed along 
the edges of a cube are applied to one of its vertices. What is the direction of 
the sum of these forces? 

2. Let three different classes of collinear vectors be given. When can any 
vector of a space be represented as the sum of three vectors of these classes? 

3. Applied to the vertices of a regular polygon are forces equal in magnitude 
and directed to its centre. What is the sum of these forces? 

4. What is the set of the sum of vectors taken from two different classes of 
collinear vectors? 


7. Groups 

Sets with one algebraic operation are in 
a sense the simplest and it is therefore natural to begin our studies 
just with such sets. We shall assume the properties of an operation 
to be axioms and then deduce their consequences. This will allow us 
later on to immediately apply the results of our studies to all sets 
where the operations have similar properties, regardless of specific 
features. 

A group is a set G with one algebraic operation, associative (al¬ 
though not necessarily commutative), for which there must exist 
an inverse. 
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Notice that the inverse of an operation cannot be considered to 
be a second independent operation in a group, since it is defined 
in terms of the basic operation. As is customary in group theory, 
we call the operation given in G multiplication and use the corre¬ 
sponding notatioii. Before considering the various examples of 
groups we deduce the simplest consequences following from the 
definition. 

Take an element a of a group G. The existence of an inverse in 
the group implies the existence of a unique element e a such that 
ae a = a. Consequently, this element plays the same part in multi¬ 
plying by it the element a on the right as unity does in multiplying 
numbers. Suppose further that b is any other element of the group. 
It is obvious that there is an element y satisfying ya = b. We now 
get 

b = ya = y (ae a ) = (ya) e a = be a . 

So e a plays the part of the right unity with respect to all elements 
of G, not only with respect to a. An element with such a property 
must be unique. Indeed, all such elements satisfy ax = a, but by 
the definition of the inverse of an operation this equation has a 
unique solution. Denote the resulting element by e'. 

Similarly we can prove the existence and uniqueness in G of e * 
satisfying e"b — b for every b in G. In fact e' and e" coincide, which 
follows from e"e — e" and e"e‘ — e'. 

Thus we have obtained a first important consequence: in any 
group G there is a unique element e satisfying 

ae = ea = a 

for every a in G. It is called the identity (or identity element) of 
a group G. 

The definition of the inverse also implies the existence and unique¬ 
ness for any a of elements a' and a" such that 

aa = e, a" a = e. 

They are called the right and the left inverse element respectively. 
It is easy to show that in this case they coincide. Indeed, consider 
an element a"aa' and calculate it in two different ways. We have 

a"aa = a" (aa) = a"e = a”, 
a"aa = (a"a) a' = ea' = a'. 

Consequently, a’ = a'. This element is the inverse of a and denoted 
by a~\ 

Now we have obtained another important consequence: in any 
group G every element a has a unique inverse element a -1 for which 

aa -1 = a _1 a = e. (7.1) 
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Because of the associativity of the group operation we can speak 
of the uniqueness of the product of any finite number of elements 
of a group given (in view of the possible non-commutativity of 
the group operation) in a definite order. Taking into account (7.1) 
it is not hard to indicate the general formula for the inverse element 
of the product. Namely, 

(aja 2 . .. a n ) _1 = ... a~ l . (7.2) 

From (7.1) it follows that the inverse element of a -1 is the ele¬ 
ment a and the inverse of the identity element is the identity ele¬ 
ment, i.e. 

(a -1 ) -1 = a, e~ x = e. (7.3) 

Verifying that a set with one associative operation is a group is 
greatly facilitated by the fact that in the definition of a group the 
requirement that the inverse operation should hold can be replaced 
by the assumption about the existence of an identity and inverse 
elements, on only one side (say, right) and without the assumption 
that they are unique. More precisely, we have the following 

Theorem 7.1. A set G with one associative operation is a group if G 
has at least one element e with the property ae = a for every a in G 
and with respect to that element any element a in G has at least one 
right inverse element a~ l , i.e. aa~ x — e. 

Proof. Let a -1 be one of the right inverses of a. We have 

aa~ l — e = ee — eaa' 1 . 

Multiply both sides of this equation on the right by one of the 
right inverses of a' 1 . Then ae — eae, from which a — ea since e is 
the right identity for G. Thus the element e is found to be also the 
left identity for G. 

If now e' is an arbitrary right identity and e" is an arbitrary 
left identity, then it follows from eV = e' and e"e' = e" that 
e' = e ", i.e. any right identity is equal to any left identity. This 
proves the existence and uniqueness in G of an identity element 
which we again denote by e. 

Further, for any right inverse element a -1 

a -1 = a~ l e = a _1 aa _1 . 

Multiply both sides of this equation on the right by a right inverse 
of a -1 . Then e = a _1 a, i.e. a -1 is simultaneously a left inverse of a. 
If now a~ v is an arbitrary right inverse of a and a' 1 " is an arbitrary 
left inverse, then from 

a- lm aa- v = (a J 'a) a’ 1 ' = ea~ v = a’ 1 ', 
a-^aa- 1 ' = or 1 " {aa~ v ) = a~ x "e = a' 1 " 
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it follows that a~ v = a~ x ". This implies the existence and unique¬ 
ness for any element a in G of an inverse element a~ l . 

Now it is easy to show that the set G is a group. Indeed, ax = b 
and ya = b clearly hold for the elements 

x = a~ x b , ij = ba~ x . 

Suppose that there are other solutions, for example, an element z 
for the first equation. Then ax = b and az = b yield ax = az. 
Multiplying both sides on the left by a -1 we get x = z. So the set G 
is a group. 

A group is said to be commutative or Abelian if the group opera¬ 
tion is commutative. In that case the operation is as a rule called 
addition and the summation symbol a + b is written instead of the 
product notation ab. The identity of an Abelian group is called 
the zero element and designated 0. The inverse of the operation is 
called subtraction , and the inverse element is called the negative 
element. It is denoted by —a. It will be assumed that by definition 
the difference symbol a — b denotes the sum a + (— b). 

But if for some reason we shall call the operation in a commutative 
group multiplication, then its inverse will be assumed to be division. 
The now equal products a~ x b and ba~ l will be denoted by b/a and 
called a quotient of b by a. 

Exercises 

Prove that the following sets are Abelian groups. 
Everywhere the name of the operation reflects its content rather than notation. 

1. The set consists of integers; the operation is addition of numbers. 

2. The set consists of complex numbers, except zero; the operation is multi¬ 
plication of numbers. 

3. The set consists of integer multiples of 3; the operation is addition of 
numbers. 

4. The set consists of positive rationale; the operation is multiplication 
of numbers. 

5. The set consists of numbers of the form a + b\[2, where a and b are 
nonzero rationals; the operation is multiplication of numbers. 

6. The set consists of a single element a; the operation is called addition and 
defined by a + a — a. 

7. The set consists of integers 0, i, 2, . . ., n— 1; the operation is called 
mod n addition and consists in calculating the nonnegative remainder less than n 
of the division of the sum of two numbers by the number n. 

8. The set consists of integers 1, 2, 3, . . ., n — 1, where n is a prime; the 
operation is called mod n multiplication and consists in calculating the nonnega¬ 
tive remainder less than ■ of the division of the product of two numbers by the 
number n. 

9. The set consists of collinear directed line segments; the operation is addi¬ 
tion of directed line segments. 

10. The set consists of coplanar directed line segments; the operation is 
addition of directed line segments. 

11. The set consists of directed line segments of a space; the operation is 
addition of directed line segments. 
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As regards the last three examples, notice that the zero element of an Abeli¬ 
an group of directed line segments is a zero directed line segment and that the 

inverse line segment to AB is BA. It follows from what was proved above that 
they are unique. Examples of noncommutative groups will be given later. 

8. Rings and fields 

Consider a set K in which two operations 
are introduced. Call one of them addition and the other multiplica¬ 
tion, and use the corresponding notation. Assume that both are 
related by the distributive law, i.e. for any three elements a, b and c 
of K 

(a + b) c = ac + be, a (b + c) = ab + ac. 

The set K is 6aid to be a ring if two operations are defined in it, 
addition and multiplication, both associative as well as related 
by the distributive law, addition being commutative and possessing 
an inverse. A ring is said to be commutative if multiplication is 
commutative and noncommutative otherwise. 

Notice that any ring is an additive Abelian group. Consequently, 
there is a unique zero element 0 in it. The element possesses the 
property that for any element a of the ring 

a + 0 — a. 

We gave the definition of the zero element only with respect to 
the operation of addition. But it plays a particular role with respect 
to multiplication as well. Namely, in any ring the product of any 
element by the zero element is a zero element. Indeed, let a be any 
element of K ; then 

a-0 = a (0 + 0) = a-0 + a-0. 

Adding to each side an element —a-0 we get a-0 = 0. It can be 
proved similarly that 0-a = 0. 

Using this property of the zero element it can be established 
that in any ring for any elements a and b 

(—a) b = — (ab). 

Indeed, 

ab + (-a) b = (a + (-a)) b = 0-b = 0, 

i.e. the element (—a) b is the negative of ab. According to our 
notation we may write it as — (ab). 

Now it is easy to show that the distributive law is true for the 
difference of elements. We have 

(a — b) c = (a -f (— b)) c = ac + (— b) c = ac + (— (be)) = ac — be, 
a (b — c) = a (b+ (—c)) = ab + a (—c) — ab + (— (ac)) = ab — ac. 
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The distributive law, i.e. the usual rule of removing brackets, 
is the only requirement in the definition of a ring relating addition 
and multiplication. Only due to this law a simultaneous study of 
the two operations gives more than could be obtained if they were 
studied separately. 

We have just proved that algebraic operations in a ring possess 
the many customary properties of operations on numbers. It should 
not be supposed, however, that any property of addition and multi¬ 
plication of numbers is preserved in any ring, be it even a commuta¬ 
tive one. Thus multiplication of numbers has a property converse 
to that of multiplication by a zero element. Namely, if a product 
of two numbers is equal to zero, then at least one of the multipliers 
equals zero. In an arbitrary commutative ring this property does 
not necessarily hold, i.e. a product of elements not equal to the zero 
element may be zero. 

Nonzero elements whose product is a zero element are called 
zero divisors. Their existence in a ring makes investigating them 
substantially more difficult and prevents one from drawing a deep 
analogy between numbers and elements of a commutative ring. 
This analogy can be drawn, however, for rings having no zero di¬ 
visors. 

Suppose in a commutative ring with respect to the operation of 
multiplication there is an identity element e and each nonzero 
element a has an inverse element a* 1 . It is not hard to prove that 
both the identity and the inverse element are unique, but what is 
most important is the fact that now the ring has no zero divisors. 
Indeed, let ab = 0, but a ^ 0. Multiplying both sides of this equa¬ 
tion on the left by a~ l we get 

a~ l ab = (a _1 a) b = eb = b 

and certainly a _1 0 = 0. Consequently, 6 = 0. 

From the absence of zero divisors it follows that from any equa¬ 
tion we can cancel the nonzero common multiplier. If ca = cb 
and c 0, then c (a — 6) = 0, from which we conclude that a — 6 = 
= 0, i.e. o=6. 

A commutative ring P in which there is an identity element and 
each nonzero element has an inverse is called a field. 

Writing the quotient a/6 as the product ab' 1 , it is easy to show 
that any field preserves all the usual rules of handling fractions , in 
terms of addition, subtraction, division and multiplication. Namely, 

a , c ad ± be a c _ oc — a _ a 

V ± ~d~ bd ’ T ‘~d~~bd , 

Besides, a/6 = cld if and only if ad = be, provided, of course, 
6^0 and d^= 0. It is left as an exercise for the reader to check 
that these assertions are true. 
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So in terms of the usual rules of handling fractions all fields 
are indistinguishable from the set of numbers. For this reason the 
elements of any field will be called numbers if of course this name 
does not lead to any ambiguity. As a rule, the zero element of any 
field will be designated as 0 and the identity element as 1. 

We shall now list all the general facts we need about the elements 
of any field in what follows. 

A. To every pair of elements a and b there corresponds an element 
a + b, called the sum of a and b, and 

(1) addition is commutative, a + b = b + a, 

(2) addition is associative, a + (b + c) = (a + b) + c, 

(3) there is a unique zero element 0 such that a + 0 = a for 
any element a, 

(4) for every element a there is a unique negative element —a 
such that a + (—a) = 0. 

B. To every pair of elements a and b there corresponds an ele¬ 
ment ab, called the product of a and b, and 

(1) multiplication is commutative, ab = ba, 

(2) multiplication is associative, a (be) = (ab) c, 

(3) there is a unique identity element 1 such that a-1 = 1-a = a 
for any element a, 

(4) for every nonzero element a there is a unique inverse element 
a' 1 such that aa -1 = a _1 a = 1. 

C. The operations of addition and multiplication are connected 
by the following relation: multiplication is distributive over addi¬ 
tion, (a + b) c = ac + be. 

These facts lay no claim to logical independence and are but 
a convenient way of characterizing elements. Properties A describe 
the field in terms of the operation of addition and say that with 
respect to this operation the field is an Abelian group. Properties B 
describe the field in terms of the operation of multiplication and 
say that with respect to this operation the field becomes an Abelian 
group if we eliminate from it the zero element. Property C describes 
the relation of the two operations to each other. 


Exercises 

Prove that sets 1-7 are rings and not fields and that 
sets 8-13 are fields. Everywhere the name of the operation reflects its content 
rather than notation. 

1. The set consists of integers; the operations are addition and multiplica¬ 
tion of numbers. 

2. The set consists of integer multiples of some number n; the operations 
are addition and multiplication of numbers. 

3. The set consists of real numbers of the form a b]f 2, where a and b 
are integers; the operations are addition and multiplication of numbers. 
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4. The set consists of polynomials with real coefficients in a single variahle t, 
including constants; the operations are addition and multiplication of polyno¬ 
mials. 

5. The set consists of a single element a; the operations are defined hy a + 
+ a = a and a-a = a. 

6. The set consists of integers 0, 1, 2, . . n — 1, where n is a composite 
number; the operations are mod n addition and mod n multiplication. 

7. The set consists of pairs (a, b) of integers; the operations are defined hy 
the formulas 

(a, b) + (c, <f) = (a + c, b + d); (a, b) -(c, d) = (ac, bd). 

8. The set consists of rational numhers; the operations are addition and 
multiplication of numhers. 

9. The set consists of real numbers; the operations are addition and multi¬ 
plication of numhers. 

10. The set consists of complex numhers; the operations are addition and 
multiplication of numbers. 

11. The set consists of real numhers of the form a + b]/2, where a and b 
are rationals; the operations are addition and multiplication of numhers. 

12. The set consists of two elements a and 6; the operations are defined hy the 
equations 

a + a = 6 + 6 = a, a + i = 6 + a = 6, 
a -a — a -b = b-a = a, b-b = b. 

13. The set consists of integers 0, 1. 2.n — 1, where n is a prime; the 

operations are mod n addition and mod n multiplication. 

The reader should note that one of the examples gives a ring with zero divi¬ 
sors. Which example is it? What is the general form of zero divisors? 


9. Multiplication of directed line segments 
by a number 

We stress once again that an algebraic opera¬ 
tion was defined by us as an operation on two elements of the same 
set. Numerous examples from physics suggest, however, that it is 
sometimes reasonable to consider operations on elements of differ¬ 
ent sets. One of such operations is suggested by the concepts of 
force, displacement, velocity and acceleration, and we shall again 
use the example of directed line segments to consider it. 

It has been customary for a long time now in physics to make 
use of line segments. If, for example, a force is said to have increased 
by a factor of five, then the line segment representing it is “extended” 
by a factor of five without changing the general direction. If, how- 
ex er, the force direction is said to have changed, then the initial 
and terminal points of the corresponding line segment are inter¬ 
changed. Proceeding from these considerations we introduce multi¬ 
plication of a directed line segment by a real number. 

We first discuss some general questions. Suppose an arbitrary 
straight line is given in the plane or in space. We agree to consider 
one of the directions on the line to be positive and the other to be 
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negative. A straight line on which a direction is specified will be 
called an axis. 

Suppose now that some axis is given and, in addition, a unit 
line segmeut is indicated, which can be used to measure any other 
line segment and thus determine its length. With every directed 
line segment on the axis we associate its numerical characteristic, 
the so-called magnitude of the directed line segment. 

The magnitude {AB) of a directed line segment AB is a number 
equal to the length of the line segment AB taken with a plus sign 
if the direction oi AB coincides with the positive direction of the 


axis, and with a minus sign if the direction of AB coincides with 
the negative direction of the axis. The magnitudes of all zero directed 

line segments are considered equal 
to zero, i.e. 

-♦-♦ - • » 

C A 3 {A4} = 0. 


Fi K- 9- 1 Regardless of which direction on 

the axis is taken to be positive, 

AB is opposite in direction to BA and the lengths of A B and BA 
are equal; consequently, 



(9.1) 


The magnitude of a directed line segment, unlike its length, may 

have any sign. Since the length of AB is the absolute value of its 

magnitude, we shall use the symbol | AB | to designate it. It is 
clear that in contrast to (9.1) 


\AB\ = \BA\. 

Let A, B and C be any three points on the axis determining three 

directed line segments AB, BC and AC. Whatever the location 
of the points, the magnitudes of these directed line segments satisfy 
the relation 

{ AB) + {BC} = {AC }. (9.2) 

Indeed, let the direction of the axis and the location of the points 
be such as in Fig. 9.1, for example. Then obviously 


\CA\ + \AB\ = \CB\. 


( 9 . 3 ) 
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According to the definition of the magnitude of a directed line 
segment and equation (9.1) 


(9.4) 


\CA\ = {CA} = -{AC}, \AB\={AB}. 

\CB\ = (CB}= -{ BC }. 

Therefore (9.3) yields 

— {AC) A {AB} — ~{BC) 

which coincides essentially with (9.2). 

In our proof we used only relations (9.3) and (9.4) which depend 
only on relative positions of the points A, B and C on the axis 
and are independent of their coincidence or noncoincidence with 
one another. It is clear that for any other location of the points 
the proof is similar. 

Identity (9.2) is the basic identity. In terms of the operation of 
vector addition, for vectors on the same axis, it means that 


{AB+BC) = {AB) ±{BC). 


(9.5) 


The magnitude of a directed line segment determines the 'line 
segment on the axis to within translation. But if we consider that 
equal directed line segments are also determined to within transla¬ 
tion, then this means that the magnitude of a directed line segment 
uniquely determines on a given axis the entire collection of equal 
directed line segments. 

Now r let AB be a directed line segment and let a be a number. 

The product a -AB of the directed line segment AB by the real number a 
is a directed line segment lying on the axis through the points A 

and B and having a magnitude equal to a• {AB}. Thus by definition 


{a.'AB}= a-{AB). 


(9.6) 


For any numbers a and P and any directed line segments a and b 
the multiplication of a directed line segment by a number possesses 
the following properties: 

1 a = a, a (Pa) = (aP) a, 

(a + P) a = aa + Pa, a (a + b) = aa + a b. 

The first three properties are very simple. To prove them it suffices 
to note that on the left- and right-hand sides of the equations we 
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have vectors lying on the same axis and use relations (9.5) and 
(9.6). We prove the fourth property. Suppose for simplicity that 

a > 0. Apply vectors a and b to 

4 __ _ a common point and construct on 

*y them a parallelogram whose diagonal 

/ / is equal to a + b (Fig. 9.2). When 

/ / / a aQ d k are multiplied by a, the 

y / / parallelogram diagonal, by the simil- 

6 t /tti / itude of figures, is also multiplied 

by a. But this means that a a + 
Fig. 9.2 -- ab = a (a + b). 

Note in conclusion that the magni¬ 
tude of a directed line segment may be treated as some “function” 

£ = {*} (9.7) 

whose “independent variable" are vectors x of the same axis and 
whose “value” are real numbers £, with 

{i+i/}= (x) + { y }, 

{Xx} = X {x} (9 ‘ 8) 

for any vectors x and y on the axis and any number 


Exercises 

1. Prove that the result of multiplication hy a number 
does not depend on the choice of the positive direction on the axis. 

2. Prove that the result of multiplication by a number does not depend on 
the way the unit line segment is specified on the axis. 

3. Prove that if we perform multiplication hy a number defined on any set 
of collinear line segments, the result will remain in the same set. 

4. Prove that if we perform multiplication by a number defined on any set 
of coplanar line segments, the result will remain in the same set. 

5. What are the zero and the negative directed line segment in terms of 
multiplication by a numher? 


10. Vector spaces 

Solving any problems reduces in the final 
analysis to the study of some sets and, in the first place, to the 
study of the structure of those sets. The structure of sets can be 
studied by various methods, for example, starting from the char¬ 
acteristic property the elements possess, as it is done in problems 
in constructing loci, or starting from the properties of operations if 
they are defined for the elements. 

The last method seems particularly tempting by virtue of its 
generality. Indeed, we have already seen at various times that 
various sets allow introduction of various operations possessing 
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nevertheless the same properties. It is obvious therefore that if in 
investigating sets we obtain some result relying only on the prop¬ 
erties of the operation, then that result will hold in all sets where 
operations possess the same properties. The specific nature of both 
the elements and the operations on them may be quite different. 

Somewhat earlier we introduced new mathematical objects, 
called directed line segments or vectors, and defined operations on 
them. It is well known that in fact there are quite real physical 
objects behind vectors. Therefore a detailed study of the structure 
of vector sets is of interest at least to physics. 

Even now we have three types of sets where operations have the 
same properties. These are the set of collinear vectors, the set of 
coplanar vectors and the set of vectors in the whole of space. In spite 
of the fact that the same operations are introduced in these sets 
we are justified in expecting that the structure of the sets must be 
different. 

There is some temptation, due to their simplicity, to study the 
sets relying only on the specific features of their elements. One 
cannot help noticing, however, that they have very much in common. 
It is therefore appropriate to attack them from some general posi¬ 
tions, in the hope of at least avoiding the tedious and monotonous 
repetitions in going from one set to another. But in addition we hope 
of course that if we obtain some set with similar properties we shall 
be able to carry over all the results of the studies already made. 

We now list the familiar general facts about vectors forming any 
of the three sets in question. 

A. To every pair of vectors x and y there corresponds a vector 
x + y, called the sum of x and y, and 

(1) addition is commutative, x + y = y + x, 

(2) addition is associative, x + (y 4- z) = (x + y) + z, 

(3) there is a unique zero vector 0 such that x + 0 = x for any 
vector x, 

(4) for every vector x there is a unique negative vector —x such 
that x 4- (—x) = 0. 

B. To every pair a and x, where a is a number and x is a vector, 
there corresponds a vector ax, called the product of a and x, and 

(1) multiplication by a number is associative, a (|5x) = (aP) x, 

(2) l*x= x for any vector x. 

C. Addition and multiplication are connected by the following 
relations: 

(1) multiplication by a number is distributive over vector addi¬ 
tion, a (x + y) = ax + a y, 

(2) multiplication by a vector is distributive over addition of 
numbers, (a + P) x = ax + px. 

These facts, as in the case of the field, lay no claim to logical 
independence. Properties A describe a set of vectors in terms of 
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addition and say that it is an Abelian group under addition. Proper¬ 
ties B describe a set of vectors in terms of multiplication of a vector 
by a number. Properties C describe the relation of the two opera¬ 
tions to each other. 

Now consider a set K and a field P of an arbitrary nature. We 
shall say that AT is a linear or vector space over P if for all elements 
of K addition and multiplication by a number from P are defined, 
axioms A, B and C holding. Using this terminology we can say that 
the set of collinear vectors, the set of coplanar vectors and the set 
of vectors in the whole of space are vector spaces over the field 
of real numbers. 

The elements of any vector space will be called vectors although 
in their specific nature they may not at all resemble directed line 
segments. Geometrical ideas associated with the name “vectors” 
will help us understand and often anticipate the required results 
as well as help us find the not always obvious geometrical meaning 
in the various facts. 

Vectors of a vector space will as before be denoted by small Latin 
letters and numbers by small Greek letters. We shall call a vector 
space rational , real or complex according as the field P is the field 
of rational, real or complex numbers, and denote it by D, R or C 
respectively. The fact that the name and notation lack any reference 
to the elements of the set has a deep meaning, but we shall discuss 
it much later. 

Before proceeding to a detailed study of vector spaces consider 
the simplest consequences following from the existence of addition 
and multiplication by a number. They will mainly concern the 
zero and negative vectors. 

In any vector space, for every element x 

0-x = 0, 

where at the right 0 stands for the zero vector and at the left 0 is 
the number zero. To prove this relation consider the element 0-x + x. 
We have 

0-x + x = 0-x -f- 1-x = (0 + 1) x = 1-x = x. 
Consequently, 

x = 0-x + x. 

Adding to each side —x we find 

0 = x -f- (—x) = (0*x -f- x) -f- (—x) = 0-x + (x + (—x)) 

= 0-x -f- 0 = 0-x. 

Now it is easy to show an explicit expression for the negative ele¬ 
ment —x in terms of the element x. Namely, 

—x = (-1) x. 
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This formula follows from the simple relations 

x + (—1) x = i-x + (—1) x = (1 — 1) x = O x = 0. 

This in turn establishes the relations 

— (ax) = (—a) x = a (—x), 

since 

— (ax) = (—1) (ax) = (—a) x = a ((-1) x) = a (-x). 

Recall that by the definition of the operation of subtraction 
x — y = x + (— y) for any vectors x and y. The explicit expression 
for the negative vector shows the validity of distributive laws for 
a difference too. Indeed, regardless of the numbers a and pand the 
vectors x and y we have 

(a — P) x = ax + (—P) x = ax + (—(Px)) = ax — Px, 

a (x — y) = a (x + (—1) y) = ax + (—a) y = ax + (—(ay)) 

= ax — ay. 

It follows in particular that for any number a 

a-0 = 0, 

since 

a*0 = a (x — x) = ax — ax = ax + (—ax) = 0. 

And, finally, the last consequence. If for any number a and any 

vector x 

ax = 0, (10.1) 

then either a = 0 or x = 0. Indeed, if (10.1) holds, then there are 
two possibilities: either a = 0 or a ^ 0. The case a — 0 supports 
our assertion. Now let a ^ 0. Then 

x = 1 'X— • a j x = (ax) = -^-0 = 0. 

It follows in particular that in any vector space the common 
nonzero multiplier can formally be cancelled from any equation, 
whether it is a number or a vector. Indeed, if ax = Px and x ^ 0, 
then (a — P) x = 0 and hence a — p = 0, i.e. a = p. If ax = ay 
and a ^ 0, then a (x — y) =0 and hence x — y = 0, i.e. x = y. 

So in terms of multiplication, addition and subtraction all the rules 
of equivalent transformations of algebraic expressions formally hold. 
We shall no longer state them explicitly in what follows. 

We have broached the subject of vector spaces. To conclude, it 
is no mere chance that we have used a single notation for the prop¬ 
erties of operations in the field and in the vector space. There are 
features of striking resemblance (as well as difference) between the 
axioms of the field and those of the vector space over a field. The 
reader should ponder on them. 
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Exercises 

Prove that the following sets are vector spaces Every¬ 
where the name of the operation reflects its content rather than notation. 

1. The field consists of real numhers; the set consists of real numhers; addi¬ 
tion is the addition of real numhers; multiplication hy a number is the multi¬ 
plication of a real numher hy a real numher. 

2. The field consists of real numhers; the set consists of complex numbers; 
addition is the addition of complex numhers; multiplication hy a number is 
the multiplication of a complex number hy a real number. 

3 . The field consists of rational numbers; the set consists of real numbers; 
addition is the addition of real numhers; multiplication hy a number is the 
multiplication of a real numher hy a rational numher. 

4. The field consists of any numher; the set consists of a single vector a; 
addition is defined hy the rule a + a — a; multiplication of the vector a hy 
any numher a is defined hy the rule a a = a. 

5. The field consists of real numhers; the set consists of polynomials with 
real coefficients in a single variahle t, including constants; addition is the addi¬ 
tion of polynomials; multiplication hy a numher is the multiplication of a poly¬ 
nomial hy a real number. 

6. The field consists of rational numhers; the set consists of numhers of the 
form a + 6^/2 -f- cl/3 + dWZ, where a, b, c and d are rationals; addition is 
the addition of numbers of tne indicated form; multiplication by a numher is 
the multiplication of a numher of the indicated form by a rational number. 

7. The field is any field; the set is the same field; addition is the addition of 
elements (vectors!) of the field; multiplication hy a numher is the multiplica¬ 
tion of an element (a vector!) of the field by an element (a number!) of the field. 

11. Finite sums and products 

* Fields and vector spaces are the main sets 

with which we shall have to deal in what follows. Two operations, 
addition and multiplication, are introduced in these sets. If a large 
number of operations on elements is performed, then there appear 
expressions containing a considerable number of summands and 
factors. For notational convenience we introduce the appropriate 
symbols. It will be assumed that addition and multiplication are 
commutative and associative operations. 

Given a finite number of not necessarily different elements, assume 
that all the elements are numbered in some way and have indices 
taking on all consecutive values from some integer A: to an integer p. 
Denote elements by a single letter with index. The index may be 
placed anywhere in notation. It may be put in parentheses near 
the letter, at the bottom of the letter, at its top and so on. This is 
of no consequence. Most often we shall write it at the lower right 
of the letter. 

We shall denote the sum of elements a h , a h+1 , . . ., a p by the 
symbol of the following form: 

p 

°k + fl )|T|+ • •• Tfln = 2 a *‘ 

1 _K 


( 11 . 1 ) 
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The index i in the formula is the summation index. Of course nothing 
will happen if we denote it by any other letter. Sometimes under the 
summation sign the collection of indices is explicitly stated over 
which summation is carried out. For example, the sum in question 
could be written as: 


a h + a /i+i + • • • + fl p= 2 a t • 

Asjisjp 

It is obvious that if every element at equals the product of an 
element b t and an element a, where a is independent of the sum¬ 
mation index i, then 


p p 

2 abt = a 2 bt , 
f3t i=fc 

i.e. the multiplier independent of the summation index may be 
taken outside the summation sign. 

Suppose now that the elements have two indices, each changing 
independently. We agree to use for these elements common notation 
at] and let, for example, k ^ i ^ p and m ^ ^ n. Arrange the 
elements as a rectangular array: 


a km ®A,m + l • • • ®An» 

a h + l . m flfc + l.m + 1 • • • a A + l, m 


a pm +1 • • • a pn• 

It is clear that whatever the order of summation the result will 
be the same. Therefore, taking into account the above notation 
for the sum, we have 

( a Am J T a k, m+t + • • • + a An) + ( a h+\. m + C A+I. m+l + • ■ • T &h+ 1. n) + 

■ • • + ( a pm + a p. m+i + • • - -r a pn ) 

n n n p n 

= 2 a hj + 2 a k+\. j+ • • • -f 2 a p>— 2 (2 °‘j)- 
)=m j=m j=m f=ft j-m 

On the other hand, the same sum equals 

( a Am + a A+1.m + ... + a pm ) 

(®A. m + 1 “h^A + l. m+l "H ••• “l" m + i) ••• "i“ (^An -L ®A + I. n ~ ® ;m) 

P P P n p 

= 2 a lm "h 2 a i. m+i ■*"••• "T 2 ®in = .2 (2 a tj)' 
i—fi i—h •*—h j—m *='i 
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Consequently, 

S(i..#)-S(i a tJ ). 

i=ft ;=m m i —h 

If we agree that we shall always take summation consecutively 
over the summation indices arranged from right to left, then the 
brackets may be dropped and we finally get 

p n n p 

2 2 2 2 a ti . 

i ft j—m }- m i=h 

This means that we may change the order of summation in summing 
over two indices. If, for example, au = a ibtj, where at is indepen¬ 
dent of the index j, then 

2 2 a A/ = 2 a< 2 b 'i- 

t —h )—m i=h )=m 

Similar results hold for sums over any finite number of indices. 

A product of elements a k , a h + 1 , . . a p will be designated by 
a symbol of the following form: 

p 

a h&h+i • • • ®p = IJ 

i=ft 

Now if a t = abt, then 

fi ab l = a p ~ h + i ft b,. 

l=h l=ft 

As in the case of summation, we may change the order of calculating 
the products over two indices, i.e. 

p « n p 

II I! II II 

i=h j=m }—m i—ft 

All these facts can be proved according to the same scheme as in 
the case of summation of numbers. 

Exercises 

Calculate the following expressions: 
n n n n 

2 c 2 <> 2 *’2 8 (*—c. 

i=l 1=1 i=»l 1-1 

n m n m n m p 

2 In, 2 2«+ 5 *>. 2 2 2< 2 '-‘> ! *. 

r=l ;'= 1 {=1 s=l i=l ;=1 ft = 1 

n n n m n m p 

n 2 , n io", n n ^ n n fi 2,+i+fc - 

1 = 1 p=l 1 = 1 ;=1 j=l ;=1 h=l 
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12. Approximate calculations 

The sets discussed above are very widely used 
in various theoretical studies. To obtain a result it is almost always 
necessary to perform some operations on the elements of the sets. 
Especially frequently a need arises to carry out manipulations with 
elements of number fields. We want to note a very important feature 
of practical realizations of such computations. 

Suppose for definiteness that we are dealing with a field of real 
numbers. Let every number be represented as an infinite decimal 
fraction. Neither man nor the most modern computer can handle 
infinite fractions. In practice therefore every such fraction is replaced 
by a finite decimal fraction close to it or by a suitable rational 
number. 

So an exact real number is replaced by an approximate one. 
In theoretical studies implying precise assignment of numbers one 
expression or another is fairly often replaced by an equal expression, 
possibly written in another form. Of course in this case such a substi¬ 
tution can raise neither any objections nor even questions. But if 
we want to calculate some expression using approximate numbers, 
then the form of the expression is no longer irrelevant. 

Consider a simple example. It is easy to check that in the case 
of the exact assignment of the number \' 2 

{V2— l) 6 —- 99 — 70 \ 2. (12.1) 

Since V 2 = 1.4142. . ., the numbers 7/5 = 1.4 and 17/12 = 
= 1.4166. . . may be considered to be approximate values for \ 2. 
But substituting 7/5 on the left and right of (12.1) we get 0.00509. . . 
and 1.0 respectively. For 17/12 we have 0.00523. . . and —0.1666. . .. 
The results of the substitutions considerably diSer, and it is not 
immediately apparent which is closer to truth. This shows how 
careful one must be in handling approximate numbers. 

We have discussed only one possible source of approximate num¬ 
bers, the rounding of exact numbers. In fact there are many other 
sources. For example, initial data for calculations often result from 
experiments and every experiment may produce a result only to 
a limited precision. Even in such simple operations as multiplica¬ 
tion and division the number of digits in fractions may greatly 
increase. We are compelled therefore to discard part of the digits 
in the results of intermediate calculations, i.e. we are again compelled 
to replace some numbers by approximate ones and so on. 

A detailed study of operations with approximate numbers is 
beyond the scope of this course. However, we shall fairly frequently 
return to the discussion of the difference between theoretical and 
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practical calculations. The need for such a discussion arises from 
the fact that theoretical calculations cannot as a rule be realized in 
exact form. 


Exercises 

1. What finite decimal fraction must be used to approx¬ 
imate for the first six digits to coincide in the results of computing the 
left- and right-hand sides of (12.1)? 

2. Let the result of each operation on two real numbers he rounded according 
to any rule you know to t decimal places. Are the commutative and associative 
properties of the operations preserved? 

3. Will the distributive laws hold under the hypotheses of the preceding 
exercise? 

4. To what conclusion do you come if the answer in Exercises 2 and 3 is no? 



CHAPTER 2 


The Structure 
of a Vector Space 


13. Linear combinations and spans 

Let e lt e t , . . e n be a finite number of 
arbitrary, not necessarily distinct vectors from a vector space K 
over a field P. We shall call them a system of vectors. One system 
of vectors is said to be a subsystem of a second system if the first 
system contains only some vectors of the second and no other vectors. 

The vectors of a given system and those obtained from them will 
be subjected to the operations of addition and multiplication by 
a number. It is clear that any vector x of the form 

x = a,e, + a 2 e 2 + . . . + a n e n , (13.1) 

where o^, a 2 , . . ., a„ are some numbers from P, is obtained from 
the vectors of a given system e t , e 2 , . . ., e„ with the aid of the 
two operations. Moreover, in whatever order the operations are 
performed, we shall obtain vectors of the form (13.1). 

A vector x in (13.1) is said to be linearly expressible in terms of 
the vectors e x , e 2 , . . ., e„. The right-hand side of (13.1) is called 
a linear combination of these vectors and the numbers ai, a 2 , . . ., a„ 
are the coefficients of the linear combination. 

Fix a system of vectors e lt e 2 , . . ., e n and allow the coefficients 
of linear combinations take on any values from the field P. This 
will determine some set of vectors in K. This set is the span of vec¬ 
tors ei, e 2 , . . ., e n and is designated L (e lt e 2 , . . ., e n ). 

Our interest in spans is accounted for by two circumstances. First, 
any span has a simple structure, being a collection of all linear 
combinations of vectors of a given system. Second, the span of any 
system of vectors from any vector space is itself a vector space. 

Indeed, all the axioms of a vector space are almost obvious. 
Some explanation may only be required by the axioms relating 
to the zero and the negative vector. The zero vector is clearly in 
any span and corresponds to the zero values of the coefficients of 
a linear combination, i.e. 

0 = 0-^! + 0’e 2 ■+■ ■ • • + 0'e n - 
The negative of (13.1) is 

—x = (—a0 e r + (—a 2 ) «*+... + (—cO e n . 
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The uniqueness of the zero and negative vectors follows from their 
uniqueness as vectors of the vector space K. 

Notice that the span of vectors e lt e 2 , . . ., e„ is the “smallest” 
vector space containing those vectors. Indeed, the span consists 
of only linear combinations of vectors e it e 2 , . . e n and any 
vector space containing e u e 2 , . . ., e n must contain all their 
linear combinations. 

So any vector space contains in the general case an infinite number 
of other vector spaces, the spans. Now the following questions 
arise: 

What are the conditions under which the spans of two distinct 
systems of vectors consist of the same vectors of the original space? 
What minimum number of vectors determines the same span? 
Is the original vector space the span of some of its vectors? 
We shall soon get answers to these and other questions. To do 
this a very wide use will be made of the concept of linear combina¬ 
tion, and in particulai of its transitive property. Namely, if some 
vector z is a linear combination of vectors x lt x 2 , . . ., x T and 
each of them in turn is a linear combination of vectors y lt y 2 , . . . 

. . ., y s , then z too may be represented as a linear combination of 
yu y 2 . • • •• y s - We prove this property. Let 

z = < S i M< (13-2) 

and in addition for every index i, 1 ^ i ^ r, let 

5 

*. = 2 Y./y/. 

where Pi and y t j are some numbers from P. 

Substituting the expression for xj on the right of (13.2) and using 
the corresponding properties of finite sums we get 

r r $ r $ 

z = 2 = 2 P« 2 v»yy=»2 2 Pivuyj 

1=1 i=l f^i i=l /il 

= 2 2 PiV.;y> = 2 (2 Pi yu) i/y — 2 \iyp 

j=1i=i j=i »=i j=i 

where the coefficients \ } imply the following expressions: 

2 P.V.y- 

1=1 

So the concept of linear combination is indeed transitive. 
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Exercises 

1. What are in a space of directed line segments the 
spans of systems of one, two, three and a larger number of directed line segments? 

2. Consider a vector space of polynomials in t over the field of real numbers. 
What is the span of the system of vectors t* + 1, t* + t and 1? 

3. In what space do all spans coincide with the space? 

4. Prove that the vector space of all directed line segments cannot be the 
span of any two directed line segments. 


14. Linear dependence 

Consider again arbitrary vectors e lt e 2 , . . . 
. . .,e n in a vector space. It may happen that one of them is linearly 
expressible in terms of the others. For example, let it be e x . Then 
each vector of e x , e 2 , . . ., e n is linearly expressible in terms of 
e 2 , e a , . . ., e n . Therefore any linear combination of vectors 
e x , fj, is also a linear combination of vectors e 2 , e 3 , . . . 

. . ., e n . Consequently, the spans of the vectors e x , e 2 , . . ., e n 
and e 2 , e a , . . ., e n coincide. 

Suppose further that among the vectors e 2 , e a , . . ., e n there is 
some vector, say, e 2 which is also linearly expressible in terms of 
the rest. Repeating our reasoning we conclude that now any linear 
combination of vectors e x , e 2 , . . ., e n is also a linear combination 
of e a , e k , . . ., e n . Continuing this process we finally come from 
the system e x , e 2 , . . ., e„ to a system from which none of the vectors 
can any longer be eliminated. The span of the new system of vectors 
obviously coincides with that of the vectors e lt e 2 , . . ., e n . In 
addition we can say that if there were at least one nonzero vector 
among e x , e 2 , . . ., e n , then the new system of vectors would either 
consist of only one nonzero vector or none of its vectors would be linearly 
expressible in terms of the others. 

Such a system of vectors is called linearly independent. 

If a system of vectors is not linearly independent, then it is said 
to be linearly dependent. In particular, by definition a system consist¬ 
ing of a zero vector alone is linearly dependent. Linear dependence 
or independence are properties of a system of vectors. Nevertheless 
the corresponding adjectives are very often used to refer to the 
vectors themselves. Instead of a “linearly independent system of 
vectors” we shall sometimes say a “system of linearly independent 
vectors” and so on. 

In terms of the notions just introduced this means that we proved 

Lemma 14.1. If not all of the vectors e lt e g , . . ., e n are zero and 
this system is linearly dependent, then we can find in it a linearly 
independent subsystem of vectors in terms of which any of the vectors 
e i, e 2 , . . ., e n is linearly expressible. 
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Whether the system of vectors e 2 , . . e n is linearly depen¬ 
dent or linearly independent is determined by one, seemingly unex¬ 
pected, fact. We have already noted that the zero vector is in the 
span and is clearly a linear combination (13.1) with zero values 
of the coefficients. In spite of this it can be linearly expressed in 
terms of the vectors e lt e 2 , . . e„ and in other ways, i.e. defined 
by another set of the coefficients of a linear combination. The linear 
independence of e lt e 2 , . . ., e n is very closely related to the unique¬ 
ness of representing the zero element in terms of them. Namely, 
we have 

Theorem 14.1. A system of vectors e lt e 2 , . . ., e„ is linearly inde¬ 
pendent if and only if 

ctie l + a 2 e 2 + . . . + a n e n = 0 (14.1) 

implies the equality to zero of all the coefficients of the linear combina¬ 
tion. 

Proof. Let n = 1. If e 2 =£ 0, then, as already noted above, = 
= 0 must yield ai = 0. But if it follows from = 0 that ai 
is zero, then e x obviously cannot be zero. 

Consider now the case n>2. Let a system of vectors be linearly 
independent. Suppose (14.1) is true for some set of coefficients 
among which there is at least one different from zero. For example, 
let ^ 0. Then (14.1) yields 

**+(—?■ ) <f 3+--- + ( _ '?r) e "’ 

i.e. e x is linearly expressible in terms of the other vectors of the 
system. This contradicts the condition that the system be linearly 
independent, and therefore it is impossible that there should be 
nonzero coefficients among those satisfying (14.1). 

If (14.1) implies that all coefficients are equal to zero, then the 
system of vectors cannot be linearly dependent. Indeed, suppose 
the contrary and let e lt for example, be linearly expressible in terms 
of the other vectors, i.e. let 

e i — + P 3 c 3 + • • . + Pn e n* 

Then (14.1) will clearly hold for the coefficients ai = —1, a 2 = 
= P 2 , . . ., a„ = P„ among which at least one is not equal to 
zero. Thus the theorem is proved. 

This theorem is so widely used in various studies that it is most 
often regarded just as definition of linear independence. 

Note two simple properties of systems of vectors associated with 
linear independence. 

Lemma 14.2. If some of the vectors e u e 2 , . . ., e n are linearly 
dependent, then so is the entire system e u e 2 , . . ., e n . 
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Proof. We may assume without loss of generality that it is the 
first vectors e u e 2 , . . ., e h that are linearly dependent. Consequent¬ 
ly, there are numbers ai, a 2 , . . a*,, not all zero, such that 

a l^l + a 2 e 2 + • • • + a h e h = 0. 

This yields 

aiCj + a 2 e 2 + . . . + a-h e k + 0 ,e *+i + • • • + 0-e n = 0. 

But this equation implies the linear dependence of e lf e t , . . ., e n 
since there are nonzero numbers among a lf a 2 , . . ., a*, 0, . . ., 0. 

Lemma 14.3 . If there is at least one zero vector among e u e 2 , . . ., e n , 
then the entire system e lt e„ is linearly dependent. 

Proof. Indeed, a system of one zero vector is linearly dependent* 
Therefore it follows from the property just proved that the entire 
system is linearly dependent. 

The following theorem is the most important result relating to 
linear dependence: 

Theorem 14.2. Vectors e u e 2 , . . e n are linearly dependent if and 
only if either e t = 0 or some vector e h , 2 ^ k n, is a linear com¬ 
bination of the preceding vectors. 

Proof. Suppose e u e 2 , . . ., e„ are linearly dependent. Then in 
(14.1) not all coefficients are zero. Let the last nonzero coefficient 
be a h . If & = 1, then this means that e t — 0. Now let k > 1. Then 
from 

a l e l + + • • • + = 0 

we find that 

This proves the necessity of the statement formulated in the 
theorem. Sufficiency is obvious since both the case where e t — 0 
and the case where e h is linearly expressible in terms of the preceding 
vectors imply the linear dependence of the first vectors in e v e 2 , . . . 
. . ., e„. But this implies the linear dependence of the entire system 
of vectors. 


Exercises 

1. Prove that if any vector of a vector space can be 
uniquely represented as a linear combination of vectors e lt e t , . .., e n , then that 
system of vectors is linearly independent. 

2. Prove that if a system of vectors e lt e t , . . e n is linearly independent, 
then any vector of the span of those vectors can be uniquely represented as a 
linear combination of them. 

3. Prove that a system of vectors ej, e 2 , . . ., e„ is linearly dependent if and 
only if either e n = 0 or some vector e fel 1 ^ n — 1, is a linear combination 
of the subsequent vectors. 


4-0510 
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4. Consider a vector space of polynomials in a variable I over the field of 
real numbers. Prove that the system of vectors 1, t, f*, . . f 1 is linearly inde¬ 
pendent for any n. 

5. Prove that a system of two noncollinear directed line segments is linearly 
Independent. 


15. Equivalent systems of vectors 

Consider two systems of vectors of a vector 
space K. Suppose their spans coincide and constitute some set L. 
Any vector of each system is clearly in L and in addition each vector 
of L can be represented in this case as a linear combination of both 
the vectors of one system and those of the other. Consequently: 

Two systems of vectors possess the property that any vector of each 
system is linearly expressible in terms of the vectors of the other. 

Such systems are called equivalent. 

It follows from the foregoing that if the spans of two systems of 
vectors coincide, then those systems are equivalent. Now let any 
two equivalent systems be given. Then by the transitivity of the 
concept of linear combination any linear combination of vectors of 
one system can be represented as a linear combination of vectors of 
the other system, i.e. the spans of both systems coincide. So we 
have 

Lemma 15.1. For the spans of two systems of vectors to coincide it 
is necessary and sufficient that those systems should be equivalent. 

Notice that the concept of equivalence of two systems of vectors is 
an equivalence relation. Reflexivity is obvious since any system is 
equivalent to itself, symmetry follows from the definition of equiva¬ 
lent systems, and the transitivity of the notion follows from that of 
the concept of linear combination. Therefore the set of all systems of 
vectors of any vector space can be divided into classes of equivalent 
systems. It is important to stress that all the systems of the same 
class have the same span. 

Nothing can be said in the general case about the number of 
vectors in equivalent systems. But if at least one of two equivalent 
systems is linearly independent, then it is possible to make quite 
definite conclusions concerning the number of vectors. They are 
based on 

Theorem 15.1. If each of the vectors in a linearly independent system 
e u e 2 , . . ., e n is linearly expressible in terms of vectors y u y 2 , . . ., t/ m , 
then n ^ m. 

Proof. Under the hypothesis of the theorem e n is linearly expres¬ 
sible in terms of y u y 2 , . . ., y m and hence the system 

l/li I /21 • • •• llm (15.1) 

is linearly dependent. The vector e„ is not equal to zero and there¬ 
fore by Theorem 14.2 some vector y t in (15.1) is a linear combination 
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of the preceding vectors. On eliminating this vector we obtain the 
following system: 

n» y n • • •» yi-i, yt+i> • • •* Vm• (15.2) 

Using the transitivity of the concept of linear combination it is 
now easy to show that each of the vectors e t , e 2 . . . ., e n is linearly 
expressible in terms of vectors (15.2). 

We join to vectors (15.2) on the left a vector e n _ x . We again con¬ 
clude that the system 

&n -It i !/li • • ’i yi + 1, • • •» Vm (15.3) 

is linearly dependent. The vector e„_j is not equal to zero and there¬ 
fore by Theorem 14.2 one of the other vectors (15.3) is a linear com¬ 
bination of the preceding vectors. This vector cannot be e n since 
this would imply the linear dependence of the system of two vectors 
e n _ l , e n and hence of the entire system of vectors e lt e 2 , . . ., e„. 
Thus some vector yj in (15.3) is linearly expressible in terms of the 
preceding ones. If we eliminate it, then we again obtain a system 
in terms of which each of the vectors e v e 2 . . . ., e n is linearly expres¬ 
sible. 

Continuing this process notice that the vectors y u y 2 , . . ., y m 
cannot be exhausted before we have joined all vectors e,, e 2 , . . ., e„. 
Otherwise it will turn out that each of the vectors e lt e 2 , . . ., e n 
is linearly expressible in terms of some of the vectors of the same 
system, i.e. that the entire system must be linearly dependent. 
Since this contradicts the hypothesis of the theorem, it follows that 
n ^ m. 

Consider consequences of the theorem. Suppose we are given two 
equivalent linearly independent systems of vectors. By Theorem 15.1 
each of the systems contains at most as many vectors as the other. 
Consequently: 

Equivalent linearly independent systems consist of the same number 
of vectors. 

Take further n arbitrary vectors, construct on them the span and 
choose on it any n + 1 vectors. Since the number of those vectors is 
greater than that of the given vectors, they cannot be linearly 
independent. Therefore: 

Any n + 1 vectors in the span of a system of n vectors are linearly 
dependent. 

In terms of equivalent systems Lemma 14.1 implies that whatever 
a system of vectors not all equal to zero may be there is an equivalent 
linearly independent subsystem in it. This subsystem is called a 
basis of the original system. 

Of course, any system may have more than one basis. All the 
bases of equivalent systems are themselves equivalent systems. It 
follows from the first consequence of Theorem 15.1 that they consist 
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of the same number of vectors. That number is a characteristic of 
all equivalent systems and is called their rank. By definition the 
rank of systems of zero vectors is considered to be equal to zero. 

Consider now two linearly independent systems consisting of the 
same number of vectors. Replace some vector of one system by some 
vector of the other. In the resulting system again replace a vector of 
the first system by some of the remaining vectors of the second 
system and so on. The replacement process is carried on until one 
system is replaced by the other. If replacement is carried out in an 
arbitrary manner, then the intermediate systems may turn out to be 
linearly dependent. However, we have 

Theorem 15.2. The process of successive replacement may be carried 
out so that intermediate systems will all be linearly independent. 

Proof. Let y v i/ 2 , . . ., y n and z x , z 2 , . . ., z n be two linearly in¬ 
dependent systems of vectors. Suppose k steps of the process have 
been carried out, with k^O. We may assume without loss of general¬ 
ity that the vectors y lt . . ., y h have been replaced by z x , . . ., z h 
and that all the systems obtained, including the system 

Zj, . • ., Z/,, yk+n • • •» Unt 

are linearly independent. This assumption obviously holds for 
k = 0. 

Suppose further that when y h +i is replaced by any of the vectors 
z h+1 , .... z n all systems 

Zj, . . ., Zh, Zj, i/h + 2 , ■..,!/« 

are linearly dependent for i = k + 1, . . ., re. Since the system 

Zi, . . ., Zh, l/h + 2, • • •» Un (f5.4) 

is linearly independent, it follows that the vectors z, for i =* k + 1,... 

, n are linearly expressible in terms of it. But *o are the vectors 
Sj for i = 1, 2, . . ., k. Consequently, all vectors z x , must be 

linearly expressible in terms of (15.4). This is impossible by virtue 
of Theorem 15.1. Therefore the replacement process indicated in 
Theorem 15.2 does indeed hold. 

Exercises 

Prove that the following transformations of a system 
of vectors, called elementary, result in an equivalent system. 

1. Addition to a system of vectors of any linear combination of those vectors. 

2. Elimination from a system of vectors of any vector which is a linear com¬ 
bination of the remaining vectors. 

3. Multiplication of any vector of a system by a number other than zero. 

4. Addition to any vector of a system of any linear combination of the re¬ 
maining vectors. 

5. Interchanging of two vectors. 
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16. The basis 

Suppose we are given a vector space consisting 
of not only a zero vector. In such a space there is clearly at least one 
nonzero vector and consequently there is a linearly independent 
system of at least one vector. There are two possibilities now: either 
there is a linearly independent system containing an arbitrarily 
large number of vectors or there is a linearly independent system 
containing a maximum number of vectors. In the former case the 
vector space is called infinite dimensional and in the latter it is 
called finite dimensional. 

With the exception of some episodic examples, our attention will 
be devoted throughout this book to finite dimensional spaces. In par¬ 
ticular, a finite dimensional vector space is any span constructed on 
a finite number of vectors of an arbitrary (not necessarily finite 
dimensional) space. 

So let vectors e v e 2 , . . ., e n constitute in a finite dimensional 
vector space K a linearly independent system with a maximum 
number of vectors. This means that for any vector x in K the system 
e lt e 2 , . . ., e n , x will be linearly dependent. By Theorem 14.2 the 
vector x is linearly expressible in terms of e 2 , e 2 , . . ., e n . 
Since x is arbitrary and e lt e 2 , . . ., e„ are fixed, we may say 
that 

A ny finite dimensional vector space is the span of a finite number of 
its vectors. 

In studying finite dimensional vector spaces now we can use any 
properties relating to spans and equivalent systems of vectors. We 
introduce the following definition: 

A linearly independent system of vectors in terms of which each 
vector of a space is expressible is called a basis of the space. 

Our concept of basis is associated with a linearly independent 
system containing a maximum number of vectors. It is obvious, 
however, that all bases of the same finite dimensional vector space 
are equivalent linearly independent systems. As we know, such 
systems contain the same number of vectors. Therefore the number 
of vectors in a basis is a characteristic of a finite dimensional vector 
space. This number is called the dimension of a vector space K and 
designated dim K. If dim K = n, then the space K is n-dimensional. 
It is clear that: 

In an n-dimensional vector space any linearly independent system 
of n vectors forms a basis and any system of n + 1 vectors is linearly 
dependent. 

Notice that throughout the foregoing we have assumed that a 
vector space consists of not only a zero vector. A space consisting of 
a zero vector alone has no basis in the above sense, and by definition 
its dimension is assumed to equal zero. 
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The concept of basis plays a great role in the study of finite dimen¬ 
sional vector spaces and we shall continually use it for this purpose. 
It allows a very easy description of the structure of any vector space 
over an arbitrary field P. In addition it can be used to construct 
a very efficient method, reducing operations on elements of a space to the 
corresponding operations on numbers from a field P. 

As shown above, any vector x of a vector space K may be repre¬ 
sented as a linear combination 

x = a x e x + a,e, + . . . + (16.1) 

where a lt a 2 , . . ., a„ are some numbers from P and e x , e 2 , . . ., e n 
constitute a basis of K. The linear combination (16.1) is called the 
expansion of a vector x with respect to a basis and the numbers a t , a 2 , . .. 
. . a„ are the coordinates of x relative to that basis. The fact that x 
is given by its coordinates a 2 , . . ., a n will be written as follows: 

x = (a lt a 2 , . . ., a„). 

As a rule, we shall not indicate to which basis the given coordinates 
relate, unless any ambiguity arises. 

It is easy to show that for any vector x in K its expansion with 
respect to a basis is unique. This can be proved by a device very often 
used in solving problems concerning linear dependence. Suppose 
there is another expansion 

x = P^i -!- p 2 e 2 + . . . + p n e„. (16.2) 

Subtracting term by term (16.2) from (16.1) we get 

(cci — p x ) e x + (a 2 — p 2 ) e 2 + . . . + (a n — P„) e n = 0. 

Since e v e 2 , . . ., e n are linearly independent, it follows that all 
coefficients of the linear combination are zero and hence expansions 
(16.1) and (16.2) coincide. 

Thus, with a basis of a vector space K fixed, every vector in K 
is uniquely determined by the collection of its coordinates relative 
to that basis. 

Now let any two vectors x and y in K be given by their coordinates 
relative to the same basis e x , e 2 , . . ., e„, i.e. 

x = a x e x + a 2 e 2 + . . . + a n e n , 

y = Yi e i + y»e 2 + . . . + y n e n , 

then 

x + y — (a x + Yi) e x + (a 2 + y 2 ) e 2 + . . . + (a n + Yn) 

Also, for any number h in the field P, 

kx — (taO e, + (hx 2 ) \ + . . . + (ka n ) e n . 



17] 


Simple examples of vector spaces 


55 


It follows that in adding two vectors of a vector space their coordinates 
relative to any basis are added and in multiplying a vector by a number 
all its coordinates are multiplied by that number. 

Exercises 

1. Prove that the rank of a system of vectors coincides 
with the dimension of its span. 

2. Prove that equivalent systems of vectors have the same rank. 

3. Prove that if a span L x is constructed on the vectors of a span L t , then 
dim 1*!^ dim L.. 

4. Prove that if a span is constructed on the vectors of a span L % and 
dim = dim L t , then the spans coincide. 

5. Prove that a vector space of polynomials with real coefficients given over 
a field of real numbers is infinite dimensional. 

17. Simple examples of vector spaces 

The fundamental concepts of linear dependence 
and of basis can be illustrated by very simple but instructive examples 
if we take as vector spaces numerical sets with the usual operations 
of addition and multiplication. That the axioms of a vector space 
hold for such sets is quite obvious and therefore we shall not verify 
their validity. As before elements of a space will be called vectors. 

Consider the complex vector space which is an additive group of 
all complex numbers with multiplication over the field of complex 
numbers. It is clear that any nonzero number z x is a linearly in¬ 
dependent vector. But even any two nonzero vectors z x and z 2 are 
always linearly dependent. To prove this it suffices to find two com¬ 
plex numbers a x and a 2 , not both equal to zero, such that a x z x + 
+ 0.^2 = 0. But this equation is obvious for a x = —z 2 and a 2 = z x . 
Therefore the vector space considered is one-dimensional. 

Somewhat diflerent is the real vector space which is an additive 
group of all complex numbers with multiplication over the field 
of real numbers. As coefficients of a linear combination we can now 
use only real numbers and therefore this vector space cannot be one¬ 
dimensional. Indeed, there are no real numbers a x and a„ all non¬ 
zero, such that for them the linear combination a x z x -f- a 2 z 2 would 
vanish, for example, when z x = 1 and z 2 = i. It is left as an exercise 
for the reader to prove that this vector space is two-dimensional. 

It is important to stress that although both of the above spaces 
consist of the same elements they are fundamentally different from 
each other. 

It is now clear that the real vector space which is an additive 
group of all real numbers with multiplication over the field of real 
numbers is a one-dimensional space. We consider further the rational 
vector space which is an additive group of all real numbers with 
multiplication over the field of rational numbers. 
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We shall try, as before, to construct a system containing a maxi¬ 
mum number of linearly independent vectors r lt r 2 , r 3 , .... It is 
clear that we may take, for example, r, = 1. Since only rational 
numbers are allowed to be the coefficients of linear combinations 
an a 2 , a 3 , . . ., it is clear that no number of the form ovl can 
represent, for example, \' 2. Therefore this space cannot be one-di¬ 
mensional. Consequently, it is V 2 that can be taken as a second 
vector linearly independent of the identity element. A number of 
the form cxj-1 + o. 2 ~Y 2 cannot, however, represent, for example, 

yz. 

Indeed, let y/2 = a, + a 2 Y2 hold for some rational numbers 
^ and a 2 . Squaring both sides of the equation we get 

VZ = (aj + 2a£) + 2a,a 2 V 2 

or 

2(1 — 2a t a 2 ) ■/o 
af+2a» ~ y ■ 

This is impossible since the left-hand side has a rational and the 
right-hand side has an irrational. 

So the space under consideration cannot be two-dimensional 
either. But then what sort is it? Surprising as it may be, it is infinite 
dimensional. However, the proof of this fact is beyond the scope of 
this book. 

The particular attention we have given to the examples of vector 
spaces of small dimensions is due to the possibility of using them to 
construct vector spaces of any dimensions. But we shall discuss 
this later on. 


Exercises 

1. What is the dimension of the vector space of ration¬ 
al numbers over the field of rational numbers? 

2. Construct linearly independent systems of vectors in the space of com¬ 
plex numbers over the field of rational numbers. 

3. Is an additive group of rational numbers over the field of real numben 
a vector space? If not, why? 


18. Vector spaces 

of directed line segments 

We have already noted earlier that the sets 
of collinear directed line segments, of coplanar directed line segments 
and of directed line segments in the whole of space form vector 
spaces over the field of real numbers. Our immediate task is to show 
their dimensions and to construct their bases. 
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Lemma 18.1. A necessary and sufficient condition of the linear 
dependence of two vectors is that they should be collinear. 

Proof. Notice that the lemma is obvious if of two vectors at least 
one is zero. It will therefore be assumed that both vectors are non¬ 
zero. 

Let a and b be linearly dependent vectors. Then there are numbers 
a and p such that 

ctfl + P& = 0. 

Since under the hypothesis a# 0 and i ^ 0, we have a ■=£ 0 and 
p 0 and therefore 

*=(—f-) a - 

Consequently, by the definition of multiplication of a directed line 
segment by a number, a and b are collinear. 

Suppose now that a and b are collinear vectors. Apply them to a 
common point 0. They will be on some straight line which is turned 
into an axis by specifying on it a direction. The vectors a and b 
are nonzero and therefore there is a real X such that the magnitude 
of the directed line segment a equals the product of the magnitude of 
b by X, i.e. {a} = X {fc}. But by the definition of multiplication of 
a directed line segment by a number this means that a — Xb. So the 
vectors a and b are linearly dependent. 

It follows from the above lemma that a vector space of collinear 
directed line segments is a one-dimensional space and that any nonzero 
vector may serve as its basis. 

Lemma 18.1 allows us to deduce one useful consequence. Namely, 
if vectors a and b are collinear and a 0, then there is a number X 
Buch that b — Xa. Indeed, these vectors are linearly dependent, i.e. 
for some numbers a and p, not both zero, aa + pb = 0. If it is 
assumed that p = 0, then it follows that a = 0. Therefore p ^ 0 
and we may take X = (—a)/p as a number X. 

Lemma 18.2. A necessary and sufficient condition for three vectors 
to be linearly dependent is that they should be coplanar. 

Proof. We may assume without loss of generality that no pair of 
the three vectors is collinear since otherwise the lemma follows 
immediately from Lemma 18.1. 

So let a, b and c be three linearly dependent vectors. Then we can 
find real numbers a, p and y. not all zero, such that 

aa -f- pb -f- yc = 0. 

If, for example, y 0, then this equation yields 
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Apply a, b and c to a common point 0. Then it follows from the 
last equation that the vector c is equal to the diagonal of the paral¬ 
lelogram constructed on the vectors (—a ly) a and (—p/y) b. This 

means that after translation to the 
common point the vectors a, b and 
c are found to be in the same plane 
and consequently they are coplanar. 

Suppose now that a, b and c are 
coplanar vectors. Translate them 
to the same plane and apply them 
to the common point 0 (Fig. 18.1). 
Draw through the terminal point 
of c the straight lines parallel to 

a and b and consider the parallelogram OACB. The vectors a, OA 

and b, OB are collinear by construction and nonzero and therefore 
there are numbers X and p such that 

OA — Xa, OB = nb. 

But OC — OA + OB, which means that c = Xa + pft or 



Xa + pfc + (—1) c = 0. 

Since X, p and —1 are clearly diSerent from zero, the last equation 
implies that a, b and c are linearly dependent. 

We can now solve the question concerning the dimension of the 
vector space of coplanar directed line segments. By Lemma 18.2 
the dimension of this space must be less 
than three. But any two noncollinear 
directed line segments of this space are 
linearly independent. Therefore the vector 
space of coplanar directed line segments is 
a two-dimensional space and any two non¬ 
collinear vectors may serve as its basis. 

Lemma 18.3. Any four vectors are linearly 
dependent. 

Proof. We may assume without loss of 
generality that no triple of the four vectors 
are coplanar since otherwise the lemma 
follows immediately from Lemma 18.2. 

Apply vectors a, b, c and d to a common 
origin 0 and draw through the terminal 
point D of d the planes parallel to the planes determined respec¬ 
tively by the pairs of vectors b, c; a, c\ a, b (Fig. 18.2). It follows 
from the parallelogram law of vector addition that 



OD = OC + OE, 


OE = OA + OB, 
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therefore 


OD=*OA + OB + OC. (18.1) 

The vectors a, OA, as well as b, OB and c, OC, are collinear by 
construction, with a, b and c being nonzero. Therefore there are 
numbers p and v such that 


OAr=Xa, OB = \ib, OC = vc. 

Considering (18.1) this yields 

d = \a + p.b + vc 

from which it follows that a, b, c and d are linearly dependent. 

From Lemma 18.3 we conclude that the dimension of the vector 
space of all directed line segments must be less than four. But it 
cannot be less than three since by Lemma 18.2 any three noncoplanar 
directed line segments are linearly independent. Therefore the vector 
space of all directed line segments is a three-dimensional space and 
any three noncoplanar vectors may serve as its basis. 

The vector spaces considered are not very obvious geometrically, 
since they allow the existence of infinitely many equal vectors. They 
become much more obvious if we choose one representative from 
each class of equal vectors and always mean by “vector" a directed 
line segment from the collection of only these representatives. 

One of the most convenient ways of choosing a vector is through 
considering the set of directed line segments fixed at some point 0. 
Then instead of the vector space of collinear directed line segments 
we obtain a space of collinear line segments fixed at 0 and lying 
on a straight line passing through 0 ; instead of the vector space of 
coplanar directed line segments we obtain a space of directed 
line segments fixed at 0 and lying in the plane through 0 ; 
and finally instead of the vector space of all directed line segments 
we obtain a space of directed line segments fixed at 0. 

In what follows we shall deal mostly with only fixed vectors. The 
corresponding vector spaces will be denoted by V u V 2 and 7 3l where 
the subscript stands for dimension. A vector space consisting of 
only a zero directed line segment will be denoted by V 0 . 

These spaces allow us to establish a 1-1 correspondence between 
points and directed line segments. To do this it suffices to assign 
to every vector its terminal point. Bearing this geometrical inter¬ 
pretation in mind we shall sometimes call elements of an abstract 
vector space points rather than vectors. 
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Exercises 

In Vi, V t and V 3 establish the geometrical meaning 

of such notions as: 

1. Span. 

2. Linear dependence and independence. 

3. Equivalent systems of vectors. 

4. Elementary equivalent transformations of a system of vectors. 

5. Rank of a system of vectors. 


19. The sum and intersection 
of subspaces 

The introduction of spans has shown that 
every vector space contains an infinite number of other vector spaces. 
The significance of these spaces is not limited to the questions con¬ 
sidered above. 

Spans were given by us by directly indicating their structure. We 
could use another way, that of defining “smaller” spaces in terms of 
the properties of vectors. Let L be a set of vectors in a vector space K. 
If under the same operations as in K the set L is a vector space, then 
it is said to be a linear subspace , thus stressing the fact that a sub¬ 
space consists of vectors of some space. It is clear that the smallest 
subspace is that consisting of only a zero vector. This will be called 
a zero subspace and designated 0. The largest subspace is the space K. 
These two subspaces are trivial, the others are nontrivial. It is also 
obvious that together with every pair of its elements x and y any 
subspace contains all their linear combinations ax -f Py. The con¬ 
verse is also true. Namely: 

If a set L of vectors of the vector space K contains together with 
every pair of its elements x and y all their linear combinations 
clx -f pi/, then it is a subspace. 

Indeed, of all vector space axioms it is necessary to verify only 
the axioms of the zero and negative vectors. The rest of the axioms 
are obvious. Take a = 0 and p = 0. From the consequences of the 
operations for vectors of K we conclude that 0-x + 0-y = 0, i.e. 
that the zero vector is in L. Now take a = —1 and p = 0. We have 
(—1) x + 0 - 1 / = (—1) x and therefore together with every vector x 
the set L contains its negative. So L is a subspace. 

The existence of a basis says that in any finite dimensional space 
any subspace is a span. In finite dimensional vector spaces therefore 
the span is the most general way of giving linear subspaces. It is 
not so in an infinite dimensional space. Nevertheless it should not 
be forgotten that there is very much in common between concepts and 
facts from finite dimensional spaces and those from infinite dimen¬ 
sional spaces. To emphasize this, even in finite dimensional spaces 
we shall oftener use the term linear subspace than span. 
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Let K be an n-dimensional space. As in the space K, a basis can be 
constructed in any of its subspaces L. If a basis e x , e 2 , . . e n is 
chosen in K, then in the general case one cannot choose basis vectors 
of a subspace L directly among the vectors e lt e 2 , . . ., e n if only 
because L can have none of them. We have, however, the converse, 
in a sense, 

Lemma 19.1. If in some subspace L of dimension s an arbitrary 
basis ti, .... t s has been chosen , then we can always choose vectors 
t 4+1 , . . ., t n in a space K of dimension n in suck a way that the system 
of vectors t x , . . ., t s , t s+1 , . . ., t n is a basis in the whole of K. 

Proof. Consider only the linearly independent systems of vectors 
in K that contain the vectors f x , . . ., t„. It is clear that among these 
systems there is a system t u . . ., t s , i s+1 , . . ., t p with a maximum 
number of vectors. But then whatever the vector x in K may be the 
system t x , . . ., t p , x must be linearly dependent. Consequently, x 
must be linearly expressible in terms of the vectors t j, . . ., t p . This 
means that f lt . . ., t s , < 4+1 , . . ., t p form a basis K and that p = n. 

Consider again an arbitrary vector space AT. 11 generates the set 
of all its subspaces, which we denote by U. On U we can define two 
algebraic operations allowing some subspaces to be constructed 
from other subspaces. 

The sum L x + L 2 of linear subspaces L x and L 2 is the set of all 
vectors of the form z = x + y, where x £ L x and y £ L 2 . 

The intersection L x f| L 2 of linear subspaces L x and L 2 is the set of 
all vectors simultaneously lying in L x and L 2 . 

Notice that both the sum of subspaces and their intersection are 
nonempty sets, since they clearly contain the zero vector of K. We 
prove that they are subspaces. 

Indeed, take two arbitrary vectors z x and z 2 from the sum L x + L # . 
This means that z x — x x + J/i and z 2 = x 2 + y 2 , where x lt x 2 6 
and y lt y 2 £ L 2 . Consider now an arbitrary linear combination 
azj + Nr We have az x + pz 2 = (axj + |}x 2 ) + (ai/i + PlM- Since 
ax l + px 2 6 L x and ay x + p y 2 6 L 2 , we have az x + pz 2 6 L x + L 2 . 
Therefore, L x + i 2 is a subspace. Now let z x , z. 2 6 L x f) L 2 , i.e. 
z lf z 2 £L X and z x , z 2 6 L 2 . It is clear that az x + pz 2 6 L x and az x + 
+ Pz 2 6 L 2 , i.e. azj-f pz 2 6 L x f) L 2 . Hence L x f| L 2 is also a subspace. 

Thus the operations of addition of subspaces and of their inter¬ 
section are algebraic. They are obviously commutative and asso¬ 
ciative. Moreover, for any subspace L of K 

L 0 = L t L f) K = L. 

There are no distributive laws relating the two operations. 

As can easily be seen even from the simplest examples, the di¬ 
mension of the sum of two arbitrary subspaces depends not only 
on those of the subspaces but also on the size of their common part. 
We have 
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Theorem 19.1. For any two finite dimensional subspaces L 1 and L t 
dim (L x 0 + dim (L x -(- L 2 ) = dim L x + dim L 2 . (19.1) 

Proof. Denote the dimensions of L x , L 2 and L x f) L 2 by r x , r 2 and m 
respectively. Choose at the intersection L x f) L 2 some basis c x , . . . 
. . ., c m . These vectors are linearly independent and are in L x . By 
Lemma 19.1 there are vectors a x , . . a h in L x such that the system 
fflj, . . a h , c x , . . c m is a basis in L x . Similarly there are vectors 
. . .. bp in L 2 such that b x ^ . . », bp , .... is a basis in L 2 * 

We have 

r x = k + m, r 2 = p + m. 

If we prove that 

«1. • • •» • • •? ^m> ^li • • m frp (19.2) 

is a basis of the subspace A + L 2 , then the theorem holds since 

m + (k + m + p) = (k + m) + (p + m). 

Any vector in L x and L 2 is linearly expressible in terms of the 
vectors of its basis and of course any is linearly expressible in terms 
of vectors (19.2). Therefore any vector in the sum L x + L 2 is also 
linearly expressible in terms of these vectors. It remains for usjto 
show that system (19.2) is linearly independent. Let 

«l«l + . . . + a h a h + Yl c l + • • • + Y m c m 

+ PA + . . . + Ppbp = 0 (19.3) 

and let 

b = PA + . . . + P pb p . (19.4) 

It is clear that b £ L 2 . But from (19.3) it follows that b £ L x . Con¬ 
sequently, b£L x f\L t , i.e. 

b = VjCj + • • . + v m c m (19.5) 

for some numbers v lt . . ., v m . Comparing (19.4) and (19.5) we get 

PA + . . . + P pbp + (—v x ) c x + . . . + (—v m ) c m = 0. 

The system of vectors b x , . . ., b p , c x , . . ., c m is linearly indepen¬ 
dent by construction and therefore 

Pr = . . . = Pp = vj = . . . = v m = 0. 

By virtue of the linear independence of a x , . . ., a h , c x , . . ., c m it 
now follows from (19.3) that 

®1 1 ■ • = = Yi = ■ • • = Y m = 0. 

Thus the theorem is proved. 
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Exercises 

1. Consider a vector space V a to establish the geomet» 
rical meaning of the sum and intersection of subspaces. 

2. What is the sum of subspaces and V 2 r 

3. What is the intersection of subspaces V t and V 2 t 

4. Prove that the dimension of the intersection of any number of subspaces 
does not exceed the minimum dimension of those subspaces. 

5. Prove that the dimension of the sum of any number of subspaces is not 
less than the maximum dimension of those subspaces. 


20. The direct sum of subspaces 

Let L u L 2 , . . ., L m be subspaces of some 
vector space. By the definition of the operation of addition any 
vector x in the sum 


K — L, L 2 + . . . + L m (20.1) 

may be represented as 

x = x x + x t + . . . + x m , (20.2) 

where x t £ L t for every i. In general this representation is not unique. 
But if every vector in K allows the unique representation (20.2), then 
sum (20.1) is called a direct sum and is designated as follows: 

K~ Li +L 2 +... + L m . (20.3) 

Direct sums possess many special properties. But we shall be 
concerned not so much with these properties as with the common 
features of representation (20.2) and expansion with respect to a basis. 
Suppose some space K may be represented as a direct sum (20.3) 
of its subspaces L u L 2 , . . ., L m . Then by virtue of the uniqueness of 
representation (20.2) the system of subspaces L u L 2 , . . ., L m may be 
regarded as some “generalized basis” of K and representation (20.2) 
as an expansion with respect to the “generalized basis”. Such an 
interpretation of a direct sum is especially helpful in the study of 
vector spaces of higher dimensions, since in those spaces one has as 
a rule to study not all the components in the expansion with respect 
to a basis but only a small portion of them. Using direct sums makes 
it possible to avoid both cumbersome expansions and investigating 
unnecessary details. 

Let K be an n-dimensional vector space. Take its arbitrary basis 
e lt e t , . . ., e n and construct a collection of spans = L x (e x ), L 2 = 
= L 2 (e 2 ), . . ., L n = L n (e„). It is then obvious that K is the direct 
Mim of these n one-dimensional subspaces. But K may be represented 
in various ways as direct sum of subspaces of other dimension. Such 
a representation is based on 
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Theorem 20.1. For a space K to be the direct sum of its subspaces 
L x , . . ., L m it is necessary and sufficient that the union of the bases of 
those subspaces should constitute a basis of the entire space. 

Proof. Let K be the direct sum of subspaces L x , ■ ■ ., L m and let 
vectors e x , . . ., e Si \ . . e Sjn +1 , . . e Sm constitute bases of those 
subspaces. Then for any vector x in K we have representation 
(20.2). By representing each of the vectors x t as an expansion with 
respect to the basis of the corresponding subspace L t we get 

x = <x,e 1 - ...+a, 1 e Jl +...+a 4m _ i+1 e Sfn i+1 +...+a 8m e Sm (20.4) 
for some numbers a lt . . ., a Sm> 

Thus every vector in K may be represented as a linear combination 
of vectors e x , . . ., e Sm . To assert that those vectors constitute a basis 
of K it remains to prove that they are linearly independent. Consider 
the equation 

Pi e i . ■ • + ■!■••• ~r P* m .,+i e » m .,+i + • • • + = ® (20.5) 

with numerical coefficients p it . . ., p Sm and let 

Pi e i + • • • + = i/i» 

. (20.6) 

P»m-i +,es m-l +1 + • ' • + = Hm- 

It is obvious that y t f L t and it follows from (20.5) that 

0 = Hi + . • . t y m . 

Every subspace contains a zero vector and therefore it is obvious 
that 

0 = 0 + . . . -i- 0. 

From the uniqueness of the expansion of the zero vector in K with 
respect to the subspaces L x , . . ., L m we conclude that 

ill = • • • ~ Um = 0* 

It follows that all the coefficients of linear combinations (20.6) are 
zero, i.e. that the vectors e x , . . ., e Sjn are linearly independent. 

Suppose now that the vectors e lt . . ., e St ; . . .; . . ., e $m 

constituting the bases of L x , . . ., L m form a basis of K. Then for 
any vector x in K there is a unique expansion (20.4). Letting 


a l c l + • • • + Otsi^S! = 


a. 




i + l 


+ • • • + = X n 


(20.7) 


we see that for x there is at least one representation (20.2). Every 
vector x ( in (20.7) is a linear combination of basis vectors of L t . 
From the uniqueness of expansion (20.4) for x we conclude that 
representation (20.2) is also unique for it. Thus the theorem is proved. 
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Exercises 

1. Under what conditions is F, a direct sum of its 

subspaces V x and F,? 

2. Under what conditions is V t a direct sum of two of its subspaces of the 
type Fj? 

3. Can F, be a direct sum of two of its subspaces of the type F t ? If not, why? 

4. Prove that for sum (20.1) to be direct it is necessary and sufficient that 
representation (20.2) should be unique for the zero vector. 

5. Prove that for sum (20.1) to be direct it is necessary and sufficient that 
the intersection of each of the subspaces Lj, l = 1, . . m, with the sum of the 
others should contain only a zero vector. 


21. Isomorphism of vector spaces 

Consider the set of all vector spaces over the 
same field P. It is natural to ask in what they are similar and in 
what different. 

The description of every vector space contains two, essentially 
different, parts. First, a vector space is a collection of specific objects 
called vectors. Second, the operations of addition and multiplication 
by a number that have some properties are defined on those specific 
objects. We may be concerned therefore either with the nature of 
vectors and their properties or with the properties of the operations 
regardless of the nature of the elements. 

We were concerned with the nature of vectors only when we studied 
directed line segments and only to an extent necessary for introduc¬ 
ing the operations and establishing their properties. After that our 
investigation of directed line segments was based solely on the 
properties of operations. We shall proceed in a similar way in every 
particular case too. Therefore two spaces with the same structure 
of addition and multiplication by a number will be assumed to pos¬ 
sess the same properties or to be isomorphic. More precisely: 

Two vector spaces over the same field are said to be isomorphic if 
between their vectors a 1-1 correspondence can be established suck that 
to the sum of any two vectors of one space there corresponds the sum of 
the corresponding vectors of the other and to the product of some number 
by a vector of one space there corresponds the product of the same number 
by the corresponding vector of the other. 

Let K and K' be two isomorphic spaces. The fact that every vector 
x in K is assigned a definite vector x' in K' may be understood as 
an introduction of some “function” 

x' = © (x) (21.1) 

whose “independent variable (or argument)” is a vector x in K and 
whose “value” is a vector x' in K'. Both properties of that function 
can now be written as follows. For any x and y in K and any 


5-0510 
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number X 


to (x -J- y) = e) (x) + to (//), 
to (Xx) = Xco (x). 


( 21 . 2 ) 


The 1-1 correspondence between K and K' implies that to any 
different independent variables of the function (21.1) there corre¬ 
spond different values, i.e. if 


then 


x ¥= y. 


(21.3) 


(0 (x) to ({/). 


(21.4) 


Consequently, the equality or nonequality of the values of the 
function implies respectively the equality or nonequality of its 
independent variables. 

Isomorphic spaces have much in common. In particular, to a zero 
vector there corresponds a zero vector, for 

to (0) = to (0-x) = 0-co (x) = 0-x' = O'. 


The most important consequence, however, is that a linearly in¬ 
dependent system of vectors is sent into a linearly independent 
system. 

Indeed, let x lf x 2 , . . ., x„ he n linearly independent vectors. Con¬ 
sider now a linear combination cijco (x x ) -f a 2 co (x 2 ) -f . . . -j- a„co (x„) 
and equate it to zero. By the property of an isomorphism 

0' = aid) (Xj) + a 2 co (x 2 ) + . . . + a„co (x n ) 

= co (ajXj + a 2 x 2 + . . . + a n x n ) = co (0), 


from which we have 

+ <*l X 2 + • • • + a„X„ = 0. 


Since x^ x 2 , . . ., x„ are linearly independent, all the coefficients 
must be zero. 

The consequence we have proved makes it possible to state that 
if two finite dimensional vector spaces are isomorphic, then they 
have the same dimension. The converse is also true. Namely, we have 

Theorem 21.1. Any two finite dimensional vector spaces having the 
same dimension and given over the same field are isomorphic. 

Proof. Let K and K' be two vector spaces of dimension n. Choose 
a basis e u e 2 , . . ., e„ in K and a basis e[, e', . . ., e'„ in K' . Using 
these systems of vectors construct an isomorphism co as follows. 
To every vector 

x = + a 2 e 2 + . . . + <x n e n 


in K assign a vector 

co (x) = a t ei + a 2 e 2 + ... + a n e' n 
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in K '. The correspondence will be 1-1 since an expansion with respect 
to a basis is unique. 

Take then any two vectors x and y in K and an arbitrary number X 
and assume that 

x = a l e 1 + a 2 e 2 + . . . + a n e„, 

1/ = Pl e l + $2 e 2 + • • • + Pn^n- 

We have 

© ( x + y) — © (( a i + Pi) e i + ( a 2 + P 2 ) e 2 + ... + (a n + p„) e„) 

= ( a l + Pi) e i + (“2 + P 2 ) e t + • • • + (®n + Pn) e n 
= (a,c^ + a 2 e' + ... + a n <£) 

+ (Pi e i + P 2 e » + • • • + Pn e n) = co (ar) to (y), 
co (for) = co ((Xa,) e t + (XaJ e 2 + ... + (Xa„) e n ) 

“ (Xa,) e[ + (Xaj) e\ 4-... (Xa„) e„ 

= X + a 2 e' + ... + a„e^) = Xco (x). 

These equations prove the theorem. 

This theorem is very important. It is this theorem that allows us 
to say with certainty now that in terms of all the consequences of 
the axioms any two vector spaces having the same dimension and 
given over the same held are indistinguishable. Consequently, wd 
could construct just one n-dimensional vector space over a given 
held and show the regularities common to all hnite dimensional 
spaces by investigating just that single space. 

Let P be some held. Consider a set whose elements are all possible 
ordered collections of n numbers a lt a 2 , . . ., cx„ from P. If x is an 
element of that set, then we shall write 

x — (a lt a 2 , • • ., o„). (21.5) 

The operations of addition and multiplication by a number X from 
the held P will be dehned as follows: 

(® 1 > ®2* • • •* ®n) "I" (Pl» P 21 • • *i Pn) 

= K + Px, a 2 + P 2 , . . a„ + p„), (21.6) 

X (ox, o 2 , • • •» On) = (X®x, Xa 2 , • • >1 Xa„). 

It is easy to check that the axioms of a vector space hold. In 
particular, the zero vector is dehned by a set of zeros alone, i.e. 

0 = ( 0 , 0 , . . ., 0 ), 

and the negative of vector (21.5) is like this: 

2: = ( Ox, Ct 2 , . . ., On)' 
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This is an ra-dimensional space and one of its bases is easy to show 
at once. Namely, 

*1 = ( 1 , 0, . . 0, 0 ), 

= ( 0 , 1 , . . 0 , 0 ), 

. (21-7) 


e n = (0, 0, . . 0, 1). 


Since for element (21.5) we have the expansion 

x = ctjCi + <x t e 2 + . . . + a n e n , 


the numbers a lt a 2 , . . ., a„ will be called the coordinates of the vec¬ 
tor x. 

We shall call a space of such a type an arithmetical space and denote 
it by P n , thus emphasizing its relation to the field P. If P is the field 
of complex, real or rational numbers, then such n-dimensional spaces 
will be denoted by C n , R„ and D„ respectively. 

It may now seem that there is no need to study arbitrary n-dimen¬ 
sional vector spaces. Indeed, we know that in terms of the conse¬ 
quences of the axioms isomorphic vector spaces are indistinguishable 
and therefore we can always successfully study, for example, P n 
alone. However, general arguments allow us to show the most impor¬ 
tant properties of vector spaces, i.e. the ones that are independent of 
basis systems or, in other words, are invariant under isomorphisms. 

Studying spaces P n alone we should always be tied to a particular 
basis and therefore it would not always be easy to see the invariance 
of deductions. Besides, it is necessary to see to it that particular 
properties of a space P n are not referred to the general properties of 
vector spaces. This is not always sufficiently easy to do. 

In conclusion note one more fact. By analogy with a space P n 
consider the space P » whose elements are all possible ordered in¬ 
finitely large collections of numbers a lt a 2 , . . . of the field P. An 
element x of that set is by analogy with (21.5) designated 

x = (a lt a 2t . . .) 

and by analogy with (21.6) we introduce operations on elements. 

Now P * is an infinite dimensional space. If we assume that 
infinite dimensional spaces are isomorphic to P », it is not hard to 
see that infinite dimensional and finite dimensional spaces must 
have much in common. This example should not be forgotten. 


Exercises 

1. Construct an isomorphism from a space V t to the 
space of reals over the field of reals. 

2. Construct an isomorphism from a space V 2 to the space of complex num¬ 
bers over the field of reals. 
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3. Prove that in isomorphic spaces equivalent vector systems correspond to 
equivalent vector systems. 

4. Prove that in isomorphic spaces an intersection of subspaces corresponds 
to an intersection of subspaces. 

5. Prove that in isomorphic spaces a direct sum of subspaces corresponds to 
a direct sum of subspaces. 


22. Linear dependence and 
systems of linear equations 

Investigation of many questions associated 
in some way with linear dependence reduces to solving the following 
problem. 

Let a x , a 2 , . . ., a m be a system of vectors and let b be a vector. 
Determine whether b is a linear combination of the given vector 
system and find the coefficients of the linear combination. 

If b is a linear combination of a lt a t , . . ., a m , then there are 
numbers z 1( z 2 , . . ., z m such that 

Zjflj -I - z 2 iz 2 "i - • • • "I - = b. (22.1) 

Consequently, the above problem reduces to the investigation of 
the vector equation (22.1) for the numbers z lt z 2 , . . ., z m . 

Suppose that the vectors are given by their coordinates in some 


*fc. 

i.e. 




— ( a m 

a 21 , . . . 

> a hl)> 

«2 

= ( fl 12i 

a 22 , . . 

•i a hi)i 


= ( a lmi 

^2m> • • 

•» 

b 

= (&i. 

fe 2 , . . ., 

b h ). 


On equating the corresponding vector coordinates on the left and 
right of (22.1) we get 

a ll z l " 1 " a l2 z 2 ")“•••+ a im z m = ^1> 

®21 Z 1 “I” ®22®2 "I" • • • "1“ ^2m z m = ^2» 

. ( 22 . 2 ) 

fl hl z l + a h2 z 2 + . • • + Oftm z m = b k . 

This system of equations which reflects coordinatewise notation of 
equation (22.1) is called a system of linear algebraic equations. The 
numbers b x , & 2 , . . ., b h are called the right-hand sides and z lt z 2 , . . . 
. . ., z m are the unknowns of the system of equations. An ordered 
collection of the values of the unknowns that satisfies each of the 
equations (22.2) is called a solution of the system. If a system of 
linear algebraic equations has at least one solution, then it is said 
to be compatible; otherwise the system is incompatible. 
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Thus the answer to the question of whether b is a linear combina¬ 
tion of vectors a u a 2 , . . a m depends on whether (22.2) is compa¬ 
tible or incompatible. If it is compatible, then any of its solutions 
gives coefficients of the expansion of b with respect to the vector 
system a u a 2 , . . ., a m . 

Two systems of linear algebraic equations in the same unknowns 
are said to be equivalent if each solution of one system is a solution 
of the other or both are incompatible. 

A general method of solving systems of equations may be based 
on a successive transformation of the original system (22.2) to such 
an equivalent system for which a solution is sufficiently easy to find. 
We shall now describe one of these methods, called the method of 
elimination or Gauss method. 

In general the solution process consists of at most k — 1 steps. 
To distinguish the coefficients of the unknowns and the right-hand 
sides obtained in the process of transformation at various steps we 
shall use an additional index, a superscript. According to this 
remark the original system (22.2) will have the following form: 


a\°hi + a\ 0 2Z 2 + • • • 4 -= b[°\ 


aV\\ 4 a ( 2<?z 2 + ... -f a$lz m = b$\ 
4|2,4 ai°}z 2 4 ... 4 aimZ m = b{ 0) . 


(22.3) 


Consider the first equation. If all the coefficients of the unknowns 
and the right-hand side are equal to zero, then the equation will 
hold for any collection of numbers z lt z 2 , . . ., z m . Consequently, we 
obtain an equivalent system if the first equation is omitted alto¬ 
gether from consideration. It may happen that all the coefficients 
of the unknowns in the first equation are equal to zero but the right- 
hand side is not. Then such an equation cannot hold for any collec¬ 
tion of numbers z lt z 2 , . . ., z m . In such a case the system is incom¬ 
patible and we have done with the investigation of it. 

Suppose that there is at least one nonzero coefficient among the 
coefficients of the unknowns in the first equation. We may assume 
without loss of generality that a) 0 ,’ =4 0, since otherwise this can be 
attained by rearranging the unknowns. The element a\°* is called 
the leading element. We express z, in the first equation in terms of 
the remaining elements and the right-hand side and then substitute 
the expression obtained for z, in all the equations except the first. 
Grouping similar terms everywhere we obtain a new system 


aV’/z, f a ( $z 2 -r • • - 4 a\mZ m = b\ 0) , 


a\$z 2 4 ... 4 — b\ \ 

ai l 2Z 2 4 ... 4 a ( hXz m = b[ l) . 


(22.4) 





22) 


Linear dependence, systems of linear equations 


71 


The coefficients of this new system are connected to the coefficients 
of the old one by the following relations: 



jjy 


for every i and /. 

Systems (22.3) and (22.4) are equivalent. Indeed, let system 
(22.3) be compatible. Then any of its solutions z lt z 2 , . . ., z m turns 
all equations of (22.3) into identities. Repeating the process of 
elimination with any of the solutions once again we see that it is 
a solution of system (22.4) as well. Suppose further that some solution 
of system (22.4) is not a solution of system (22.3). It clearly satisfies 
the first equation of (22.3). Let it not satisfy some equation with an 
index i ^ 2. Then, repeating once again the elimination process, 
we conclude that the solution chosen must not satisfy the ith equation 
of system (22.4). But this contradicts the hypothesis. It is now 
clear that if one of the systems is incompatible, so is the other. 

We have described only the first step in the transformation of the 
system. All the other steps are carried out in a similar way. At the 
second step we eliminate the unknown z 2 from all the equations 
except the first two, at the third step we eliminate the unknown z s 
from all the equations except the first three and so on. If in the 
process of transformations we do not encounter equations where all 
coefficients of the unknowns are equal to zero, then in A: — 1 steps 
we arrive at the system 


a\°hi + a \°2 z i + • 

• • + Gift*Zft i + ®1°,\+ lZft+i + 

..+a[°>z m =b\'». 

a$Zt + . 

• • d-O^ftZfc +fli 1 , ) ft+I z ft+| — 

.. +4mZ m =b[ l) , 


.(ft-l). . „<»!->> * 

0)tft Zft + 0ft, ft + iZft-M -r 

• • +0ftm l) Z m bh h 


(22.5) 


equivalent to system (22.4). If in the process of transformation we 
encounter identically satisfied equations, then system (22.5) will 
consist of a smaller number of equations. 

The unknowns z ft+1 , . . ., z„ are called free unknowns. It is obvious 
that regardless of the values assigned to them we can successively 
determine all the others from system (22.5) beginning with z ft . 
The coefficients a|®\ a”’ , . . ., a ( A -1) by which we have to divide 
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are the leading elements of individual steps and therefore they are 
all nonzero. 

So from the theoretical point of view the concept of linear depend¬ 
ence has been investigated sufficiently fully. As to practice , however, 
it may result in very serious difficulties. Consider, for example in 
a space R k , a system of vectors 


a l = (i, —2, 0. 0, 0), 

a z = ( 0 . 1 , —2 . 0 , 0 ), 

a h _ t = (0, 0, 0. 1, —2), 

= 0 , 0 . 0 , 1 ). 


( 22 . 6 ) 


It is linearly dependent since 

2~ k a l + 22"‘a k = 0. 

Notice that 10' 12 for k > 40 and therefore it is but 

natural that in practical calculations a desire arises to neglect so 
small a value of the coordinate. Besides as a rule all numbers are 
not known exactly and nearly always contain significantly greater 
errors. But even if the coordinates were known exactly, the very 
first manipulations of them would lead to inexact results if the 
calculations had been made approximately. It should be added that 
most modern computers cannot recognize so small numbers as 
2~ (fc ' 1) for k > 64 and operate with them as with zeros. In actual 
practice therefore instead of the system of vectors (22.6) we may 
have to deal with the following system: 


a, = (l, —2, 0, .... 0, 0), 

a, = (0, 1, —2, 0, 0), 

a h ., = (0, 0, 0, 1, —2), 

<i k = (0, 0* 0, •. • i 0, 1). 

But this system is linearly independent. 

Thus small changes in the coordinates of vectors may result in 
a linearly dependent system becoming, under an approximate assign¬ 
ment of coordinates and approximate calculations using them, 
linearly independent, and vice versa, a linearly independent system 
becoming linearly dependent. But then it is but natural to ask what 
is the practical importance of such notions as linear dependence, 
rank, basis, compatible and incompatible system and in general of 
everything we have so far investigated? There is no simple answer 


(22.7) 
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to this question since it requires a deep understanding of the prob¬ 
lems one solves. It is with this question that the differences distin¬ 
guishing “ exact " mathematics from “ approximate” mathematics begin. 


Exercises 

1. Prove that if system (22.2) is compatible, then it 
has a unique solution if and only if the system of vectors Oj, a„ . . is 

linearly independent. 

2. Prove that if a system of vectors a„ a„ . . ., has rank r, then sys¬ 
tem (22.5) consists of r equations. 

3. Assume that the solutions of a system are vectors of a space P„. Let 
6=0 and let the system of vectors a lt a,, . . ., ^ has rank r. Prove that in 
this case the set of all solutions of (22.2) forms an (m — r)-dimensional sub¬ 
solutions of the system of linear algebraic equations 

1^2s, + l-* 2 = V r 3, 

2**i+l/"2 *2= VS- 


space of P m . 
4. Find all 


Solve the same system giving |/~2, V$ an d V"6 to various accuracy. Compare 
the results. 

5. Establish the relation of the Gauss method to elementary transformations 
of a system of vectors. 




CHAPTER 3 


Measurements in Vector Space 


23. Affine coordinate systems 

An enormous number of science and engineer¬ 
ing problems require a precise description in space of various geo¬ 
metrical objects such as points, figures, curves, surfaces and so on. 
For a complex object it is very important to know not only a general 
characterization of its location, such as indication of the centre of 
gravity, but also the position of each of its individual points. 

As an example recall that the prediction of lunar and solar eclipses 
is possible because we know the position of celestial bodies at every 
moment. Television broadcasts over great distances are possible 
because the position of each point of the image being transmitted 
is defined. 

It is obviously necessary to give a method of describing the position 
of only one individual point, since any geometrical object can be 
given as some collection of points. Probably it would be useful to 
consider independently the position of a point on a straight line, 
in the plane or in space because a spatial description of an object is 
by far not always appropriate. For example, a photograph can 
obviously be considered only in the plane while the motion of a 
particle, with no forces acting upon it, can be considered on a straight 
line. 

One of the most common descriptions of the position of a point 
is based on a very simple idea. We have already noted that it is 
possible to establish a 1-1 correspondence between all points and 
fixed directed line segments. The description of the position of a 
point therefore can be replaced by the description of the position 
of the corresponding directed line segment. But the position of this 
line segment is characterized by its coordinates relative to any basis, 
i.e. by some ordered collections of numbers. Consequently, the 
position of a point must also be characterized by ordered collections 
of numbers. We now proceed to explore this idea. 

Given some straight line, fix on it an arbitrary point 0 and con¬ 
sider a space Fj of vectors lying on the given straight line and fixed 
at the point 0. Choose in that space some basis vector a. Now turn 
the straight line into an axis by specifying on it a direction so that 
the magnitude of the segment a is positive (Fig. 23.1). 
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The axis with the given point 0 and basis vector a forms an affine 
coordinate system on the straight line. The point 0 is called the origin, 
and the length of the vector a is the scale unit. 

The position of any point M on the straight line is uniquely deter¬ 
mined by that of the vector OM. The vectors a and OM are collinear, 
with a^O, and so according to the consequence of Lemma 18.1 
there is a real a such that 


OM = aa. (23.1) 

That number is called an affine coordinate of the point M on the 
straight line. The point M with a coordinate a is designated M (a). 

Notice that with a fixed affine coordinate 
system on the straight line relation (23.1) , , L 

uniquely defines the affine coordinate a of 6 a m 
any point M of that straight line. Obviously 
the converse is also true. Namely, relation Fig. 23.1 

(23.1) makes every number a uniquely 

define some point M of the straight line. Thus given a fixed affine 
coordinate system there is a 1-1 correspondence between all real 
numbers and the points of a straight line. 

Giving points by their coordinates allows us to calculate the 
magnitudes of directed line segments and the distances between 
points. Let (a x ) and M t (a 2 ) be given points. We have 


{M t M 2 } = {OM z — OM ,} = (a 2 a — a t a} = {(a 2 — a,) a) 

= (a 2 —a,){a} = (a 2 —aO | a |. (23.2) 

If p (.1/,, M 2 ) denotes the distance between points M x and M 2 , then 

p (M t , MJ= j {M t M 2 } | = | 0 C 2 — a, | | a |. (23.3) 


The formulas become particularly simple if the length of the basis 
vector equals unity. In this case 


(M,M 2 } = a 2 -a t , 

P(M { , M£ = [ ocj; —a, |. 

Now let some plane be given. Fix on it an arbitrary point 0 and 
consider a space V 2 of vectors lying in the plane and fixed at 0. 
Choose in that space some pair of basis vectors a and b. Specify 
directions on the straight lines containing those vectors so that the 
magnitudes of a and b are positive (Fig. 23.2). 

The two axes in the plane intersecting at the same point 0 and 
having the basis vectors a and b given on them form an affine coordi- 
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natesystem in the plane. The axis containing a is called the x axis or the 
axis of abscissas ; the axis containing b is the y axis or the axis of 
ordinates. 

Again the position of any point M in the plane is uniquely deter¬ 
mined by the vector OM and in turn there is a unique vector decom¬ 
position of the form 

OM = aa + $b. (23.5) 

The real numbers a and p are again called the affine coordinates of 
the point M. The first coordinate is called the abscissa and the second 

the ordinate of M. The point M 
with coordinates a and p is de¬ 
signated M (a, p). 

On the x and y axes there are 
unique points M x and M y such that 

OAI = OM x +- OM y . (23.6) 

They are at the intersection of the 
coordinate axes with the straight 
lines parallel to the axes and pass¬ 
ing through M. We call them the 
affine projections of the point M onto 
the coordinate axes. The vectors 

OM x and OM y are the affine projections of OM. From the uniqueness 
of decompositions (23.5) and (23.6) we conclude that 



OM x = aa t 0M„ = P6. (23.7) 

Thus if M has the coordinates M (a, p), then M x and M y , as 
points of the plane, have the coordinates M x (a, 0) and M y (0, p). 
Moreover if 


OM = (a, P), 

then 

OM x — (a, 0), OM y = (0, P). 

Every basis vector forms a proper coordinate system on its axis. 
The points M x and M y therefore may be regarded as points of the 
x and y axes given in those proper coordinate systems. It follows 
from (23.7), however, that the coordinate of M x on the x axis is equal 
to the abscissa of M. Similarly, the coordinate of M y on the y axis 
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is equal to the ordinate of M. Obvious as they may be, these asser¬ 
tions are very important since they allow us to use formulas (23.2) 
to (23.4). 

Specifying an ordered pair of numbers a and p uniquely defines 
some point. Indeed, relations (23.7) allow one to construct in a 
unique way the affine projections of a point that uniquely determine 
the point in the plane. Consequently, given a fixed affine coordinate 
system there is a 1-1 correspondence between all ordered pairs of real 
numbers and points in the plane. 

Similarly we can introduce an affine coordinate system in space. 
Fix a point 0 and consider a space V 3 of vectors fixed at 0. Choose 
in that space some triple of basis 
vectors a, b and c. Specify directions 
on the straight lines containing those 
vectors so that the magnitudes of 
the directed line segments a, b and 
c are positive (Fig. 23.3). 

The three axes in space intersecting 
at the same point 0 and having the 
basis vectors a, b and c given on 
them form an affine coordinate system 
inspace. The axiscontaining a is called 
the x axis or axis of x coordinates or 
abscissas, the axis containing b is 
the y axis or axis of y coordinates or Fig. 23.3 

ordinates and the third axis is the z 

axis or axis of z coordinates. Pairs of coordinate axes determine the 
so-called coordinate planes designated x, y; y, z; x, z planes. 

The position of any point M in space is again uniquely determined 

by the vector OM for which there is a unique decomposition 

OM = aa + p& + yc. 

The real numbers a, p and y are called the affine coordinates of a 
point M in space. The first coordinate is called an x coordinate or 
abscissa, the second is called a y coordinate or ordinate and the third 
is a z coordinate of M. The point M with coordinates a, p and y is 
designated M (a, p, y). 

Draw through the point M the planes parallel to the coordinate 
planes. The points of intersection of these planes with the x, y and z 
axes are denoted by M x , M y and M z and called the affine projections 
of the point M onto the coordinate axes. The intersection of the coordi¬ 
nate planes with pairs of planes passing through M determines points 
M yz , M xz and M xu , called the affine projections of the point M onto 

the coordinate planes. Accordingly the vectors OM vz and OM x and 
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so on are called the affine projections of the vector OM. It is obvious that 


OM = OM x + OM y + OM z , 
OM yz = OM y + OM z , 

OM xz = OM x 4 OM z , 


om xu =om x +6ai ji . 

We conclude, as in the case of the plane, that if a point M has the 
coordinates 

M (a, p, y), 

then the affine projections of that point will have the coordinates: 
M x (a, 0, 0), M y (0, p, 0), M z (0, 0, y), 

M yt (0, p, v), M xt (a, 0, y), M xy (a, p, 0). 

Similarly, if 

OM — ( a, p, y). 

then 

OM x = (a, 0, 0), OM y = (0, p, 0), OM z ~- (0, 0, Y ), 

OM yz — ( 0, p, y), OM xz = (a, 0, y). OM Xy = (a, p, 0). 

Again every basis vector and every pair of basis vectors form prop¬ 
er affine systems on the coordinate axes and coordinate planes. 
And again the coordinates of points in those systems coincide with 
the affine coordinates of the same points regarded as points in space. 
Now, given a fixed affine coordinate system there is a 1-1 correspon¬ 
dence between all ordered triples of real numbers and points in space. 

Of the affine coordinate systems on the straight line, in the plane 
and in space the most widely used are the so-called rectangular 
Cartesian coordinate systems. They are characterized by all basis 
vectors having unit length and the coordinate axes being mutually 
perpendicular in the case of the plane and space. In a Cartesian 
system basis vectors are usually denoted by t, / and k. In what follows 
we shall use only these systems as a rule. 


Exercises 

1. Which of the points A (a) and B (—a) is to the 
right of the other on the coordinate axis of Fig. 23.1? 

2. What is the locus of points M (a, P, y) for which the affine projections Mxy 
have coordinates M xy (—3, 2, 0)? 
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3. Do the coordinates of a point depend on the choice of direction on the 
coordinate axes?. 

4. How will the coordinates of a point change if the length of the basis vectors 
is changed? 

5. Wnat coordinates has the centre of a parallelepiped if the origin coincides 
with one of its vertices and the basis vectors coincide with its edges? 


24. Other coordinate systems 


Coordinate systems used in mathematics allow 
us to give with the aid of numbers the position of any point in space, 
in the plane or on a straight line. This makes it possible to carry 
out any calculations with coordinates and, what is very important, 
to apply modern computers not only to all sorts of numerical com¬ 
putations but also to ,the solution of 
geometrical problems and to the 
investigation of any geometrical 
objects and relations. Besides the affine 
coordinate systems considered above 
some others are not infrequently used. 

The polar coordinate system. Choose 
in the plane some straight line and fix 
on it a Cartesian system. Call the origin 
0 of the system the pole and the coor¬ 
dinate axis the polar axis. Assume 
further that the unit segment of the 
coordinate system on the straight line 

is used to measure the lengths of any line segments in the plane. Con¬ 
sider an arbitrary point M in the plane. It is obvious that its posi¬ 
tion will be completely defined if we specify the distance p between 
the points M and 0 and the angle <p through which it is necessary 
to turn the ray Ox counterclockwise about the point 0 until its direc¬ 
tion coincides with that of the line segment OM (Fig. 24.1). 

The polar coordinates of the point M in the plane are the two num¬ 
bers p and <p. The number p is the polar radius and the number <p 
is the polar angle. It is usually assumed that 

0 n <■" 4-no 0 m 9 tt ( 94 1 ^ 



If the point M coincides with the pole 0, then the polar angle is 
considered to be undefined. 

Associated in a natural way with every polar coordinate system 
is a rectangular Cartesian system. In this the origin coincides with 
the pole, the axis of abscissas coincides with the polar axis and the 
axis of ordinates is obtained by rotating the polar axis through an 
angle of n/2 about 0. 
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Denote the coordinates of the point M in the Cartesian x, y system 
by a and (5. We have the obvious formulas 

a = p cos <p, P = p sin 9 . 

From these we obtain the inverse relations 

p 2 = a 2 + p 2 , cos <p = _|_ ( a *_j_p 2 )i/a • sinq>= • 

They allow us to calculate from the Cartesian coordinates of a point 
its polar coordinates and vice versa. 

Cylindrical coordinates. Choose in space some plane n and fix on 
it a polar coordinate system. Through the pole 0 draw the z axis 




perpendicular to n (Fig. 24.2). Assume again that to measure the 
lengths of any line segments in space we use the same unit segment. 
Introduce in n a Cartesian system corresponding to the polar system. 
Together with the z axis it forms a Cartesian system in space. 

Consider the projections M z and M xy of the point M onto the z 
axis and x, y plane. The point M xy as a point of n has polar coordi¬ 
nates p and cp. The point M t as a point of the z axis has a z coordinate. 

The cylindrical coordinates of the point M in space are the three 
numbers p, <p and z. It is again assumed that 

O^pC+oo, 0^q>< 2n. 

For the points of the z axis the angle qp is not defined. 

Cartesian coordinates in the x, y, z system and cylindrical coordi¬ 
nates are connected by the relations 

x = p cos <p, y = p sin cp, z = z. 

Spherical coordinates. Consider in space a Cartesian x, y , z system 
and the corresponding polar coordinate system in the x, y plane 
(Fig. 24.3). Let M be any point in space other than 0, and let M xy 
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be its projection onto the x, y plane. Denote by p the distance from 

M to 0 , by 8 the angle between the vector OM and the basis vector 
of the z axis, and finally by 9 the polar angle of the projection M xy . 

The spherical coordinates of the point M in space are the three 
numbers p, <p and 8 . The number p is a radius vector , the number <p is 
a longitude , and the number 8 is a colatitude. It is assumed that 

0^p<C+oo, 0 ^ 9 < 2n, 0 ^ 8 ^ n. 

The longitude is undefined for all the points of the z axis, and the 
colatitude is not defined for the point 0. 

Cartesian coordinates in the x, y, z system and spherical coordi¬ 
nates are connected by the relations 

x = p sin 8 cos 9 , y = p sin 8 sin 9, z = p cos 8 . 


Exercises 

1. Construct a curve whose points in polar coordinates 
satisfy the relation p = cos 39 . 

2. Construct a curve whose points in cylindrical coordinates satisfy the 
relations p = qr 1 and 2 = 9 . 

3. Construct a surface whose points in spherical coordinates satisfy the 
relations 0 <p<l, 9 = 11/2 and 0 < 8 <n/ 2 . 


25. Some problems 

Consider several simple problems in applying 
Cartesian systems. For definiteness we shall consider problems in 
space. Similar problems in the plane differ from those in space only 
in minor details. It will be assumed throughout that some coordi¬ 
nate system is fixed whose origin is a point 0 and whose basis vectors 
are i , / and k. 

The coordinates of a vector. Let M, (a lt p 1? Vi) and M 2 (a 2 , p 2 , y 2 ) 

be two points in space. They determine a vector M X M 2 which has 
some coordinates relative to the basis i, j and k. We establish the 

-*• 

relation between the coordinates of M l M 2 and those of the points 
Mi and M 2 . We have 

m\m 2 =6m 2 -6m 1 . 

Further, by the definition of the affine coordinates of Mi and M 2 


OM , = a ji + p,/ + y x k, OM 2 = a 2 i + p 2 /' + y 2 k. 
Therefore it follows that 

M x M 2 = (a 2 — a 1 ) i + (P 2 - P 2 ) / + (v 2 — Yi) k 


6-0510 
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or according to the accepted notation 


M t M a = (oa —a f , P 2 —P„ Y2—Yi)- 


(25.1) 


Coordinate projections of a vector. Again consider the directed 
-»■ 

line segment M X M 2 in space. On projecting the points M j and M 2 
onto the same coordinate plane or the same coordinate axis we obtain 
a new directed line segment. It is called a coordinate projection of the 

vector M y M 2 . 

Every vector in space has six coordinate projections—three pro¬ 
jections onto the coordinate axes and three projections onto the 
coordinate planes. It is easy to find the coordinate projections in 
the basis t, j and k from the coordinate points M x (a lt p lt Yi) an( l 
M 2 (a 2 , p 2 , y 2 ). To do this it suffices to use formulas (23.8) and (25.1). 

For example, let us find the coordinates of the projection M lx M 2x . 
Considering that the points M lx and M tx have the coordinates 

M lx (a lt 0, 0), M 2X (a 2 , 0, 0), 

we find that 


Similarly 


Mi x M ix ~ (a 2 — eti, 0, 0). 


MtxzM 2 xt — (a 2 —aj, 0, y 2 — Yi) 


(25.2) 


and so on for all the remaining projections. 

Comparing the first of the formulas (23.4) with formulas of the 
type (25.2) we conclude that 


{Mlx^zx} — a 2 a t> lyM 2y} —P 2 Pi» {MizMzt} — Y 2 Yi* 

Therefore the magnitudes of the projections of a vector onto the 
coordinate axes coincide with the coordinates of that vector. 

The second of the formulas (23.4) allows us to calculate the lengths 

of the projections of M X M 2 onto the coordinate axes from the coordi¬ 
nates of M x and M 2 . Namely, 

I M ix M 2x I = I a 2 —a, |, | M iy M 2y | = | P 2 —Pi | , 

I M lz M 2t \ = | Y 2 -Y 1 1- 

The length of a vector. We establish a formula for the length of 

a vector in space. It is obvious that the length | M X M 2 | of M X M 2 
equals the distance p {Mi, M 2 ) between M x and M 2 and the length 
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of the diagonal of the rectangular parallelepiped whose faces are 
parallel to the coordinate planes and pass through M x and M 2 
(Fig. 25.1). The length of any edge of the parallelepiped equals that 

of the projection of the vector M X M 2 onto the coordinate axis parallel 



to the edge. Using therefore the Pythagorean theorem we conclude 
that 

I I - (l i 2 +! !*+1 M\Xr 2z r-) ,/2 . 

If Mi and M 2 are now given by their coordinates M 1 (a lt p x , Yi) 
and M 2 (a 2 , p 2 , y 2 ), then 

p(M p M 2 ) = ((a 2 -a 1 )^(p 2 -p ,) 2 + (Y 2 -Y«) 2 ) 1/2 . (25.3) 


If the vector M X M 2 is given by its x, y and z coordinates relative to 
the basis i, j and k, then 

|M 1 M 2 | = (x’ + y 2 + z f /2 . (25.4) 

Of similar form are the corresponding formulas for the plane. If the 

points Mi (ctj, p x ) and M 2 (a 2 , p 2 ) or the vector M X M 2 = (x, y) are 
given by their coordinates, then 

P (M lt M 2 ) — ((<*2 — a i ) 2 + (P2 — Pi) z ) 1/2 » I ~M\M 2 1 = (x z + j/*) 1/2 . 

The angle between vectors. Consider in space nonzero vectors 
(i and b. Apply them to a point 0. Denote by n the plane passing 
through 0 and containing both vectors. The angle between a and b 
is the smallest angle through which one of the vectors must be turned 
about 0 in the plane n for its direction to coincide with that of the 
other vector. If at least one of the vectors is zero, then the angle is 
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undefined. Our task is to calculate the cosine of the angle between the 
vectors from the coordinates of the vectors. We denote a cosine by 
cos {a, ft}. 

Denote by A and B the terminal points of a and b in n. It is obvious 
that the angle between a and b is nothing than the angle AOB of the 

triangle AOB whose sides are the vec- 
A tors a and b and b — a (Fig. 25.2). 

Suppose a and b are given by their 
coordinates 

a = (*1. Ul . Zj). b = (*2, U z 2 )- 
Then 

b - a = (X 2 — Xy, y z — IJy, Z 2 — Zy). 



As is known from elementary geometry, the square of the length 
of one side of a triangle is equal to the sum of the squares of the 
lengths of its other two sides minus the double product of the lengths 
of those sides by the cosine of the angle between them. Therefore 

| ft — a | 2 = | a |* + | 6 | 2 — 2 | a | - 1 ft | cos {a, ft} 


or, taking into account formula (25.4), 

(x 2 - x,) 2 + (y 2 - j/,) 2 + (z 2 - Zi) 2 = arj + J/ 2 + z\ 

+ A + y\ J rA~ 2 (*J + S^ + zJ) ,/2 (x 2 + y 2 TZ 2 ) ,/2 cos{o, ft}. 
Performing elementary transformations we find 


cos (a, ft} = 


* 1 * 2 + 0102 +* 1*2 


(*it 0M *i) l/2 (•*! -r 0i -- *1) 


2 _- 2 \ 1/2 


(25.5) 


Changes in the formula for the plane are obvious. 

Dividing a line segment in a given ratio. Suppose a straight line 
is given in space and let My and M z be two distinct points on that 
straight line. Choose a positive direction on the straight line. On 

the resulting axis My and M 2 determine a directed line segment MyM 2 . 
Let M be any point on the axis other than M 2 . The number 


? 1 M t M) 

(MM 2 ) 


(25.6) 


is cal led the ratio in which the point M divides the directed line segment 
AfyM 2 . 

- > 

Changing the direction on the axis makes the numbers {MyM) 
and {MM 2 ) change sign simultaneously. Hence ratio (25.6) is in- 
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dependent of the choice of positive direction on the axis. Also when 
changing the scale of measuring the length of line segments on the 

axis the numbers {\I X M} and {MM 2 } are multiplied by the same 
number. Hence ratio (25.6) is independent of the choice of unit 
length. It follows that ratio (25.6) 
is independent of the choice of coor¬ 
dinate system on the axis. 

The problem is to calculate the 
coordinates of the point M dividing 

\I X M 2 in a ratio X given the coordi¬ 
nates of the points M x and M 2 and the 
number X, with X —1. So let 
-Hj (dj, p lt Yi) and M 2 (a 2 , p 2 , Ya) 
be given and let M (a, p, y) be 
unknown. Project these points onto Fig. 25.3 

the coordinate axes, for example the 

x axis (Fig. 25.3). It is clear from similarity considerations that 

the point M x also divides the directed line segment M Xx M 2x 
in a ratio X. Therefore 



{M v Af 2 ,) 


(25.7) 


By formula (23.4) {M lx M x ) = a — a x and {M X M 2x ) = a 2 — a. 
Now taking into account (25.7), we find that a — (a 2 -j- Xa 2 )/(1 X). 

The calculation of the coordinates p and y is similar. So 

a, + Xa 2 p Xp 2 _Yi + XYz 

11- x ’ p —nrr~» » ~ t-rx ■ • 


Notice that X >. 0 if M is inside the line segment M X M 2 , X < 0 

if M is outside M X M 2 , and X = 0 if M coincides with M x . If M 
moves from M x to M 2 (excluding the coincidence with A/ 2 ), the 
ratio X first takes on a zero value and then all possible positive 
values successively in increasing order. If M moves from M x in the 
positive direction of the axis (see Fig. 25.3), then X first assumes a 
zero value and then negative values in decreasing order approaching 
arbitrarily closely X = —1 but always remaining greater than 
X = —1. If M moves in the negative direction from M 2 , then X 
fakes on all possible negative values in increasing order but always 
remains less than X = —1. 

Thus a 1-1 correspondence could be established between all real 
numbers and the points on the straight line if the straight line 

contained a point M dividing the line segment M X M 2 in the ratio 
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X = — 1 and the point M coinciding with M t could be assigned 
some number. This question is usually solved by joining to the 
straight line an ideal extra “point" and by joining to the number an 
ideal extra “number”. Such a point is called a “point at infinity” 
and such a number is called an “infinitely large” number. 
Orthogonal projections of a vector. Let u be some axis in space 

and let AB be a directed line segment. Draw through the points 
A and B the planes perpendicular to u (Fig. 25.4). The intersection 
of the planes with the axis determines the points A u and B u , with 

A u lying in the same plane as A 
and B u in the same plane as B. 

The directed line segment A U B U 
is called the orthogonal projection 

of AB onto the axis u. The follow¬ 
ing notation is used to denote it: 

A U B U — pv u AB. 

Fi fJ- 25 ,4 For a fixed axis u every vector 

x in space uniquely determines its 
orthogonal projection x'. It may be assumed therefore that some 
“function” 

x' = pr u x (25.8) 

is given whose “independent variable” may be any vector in space 
and whose “value” is the vector on the axis u. We shall now prove 
that that function possesses the following properties: 

Pr u (* + y) = pr u x + pr u y, 

pr u (Xx) = kpr u x, ' ’ 



true for any vectors x and y and any number X. 

Indeed, fix some Cartesian system in which the axis u coincides 
with the axis of abscissas. In that system let 

x = ( a ii Pi> Yi)> 
y t = *( a 2> V 2 )» 

then 


x + l J — ( a i t a 2 i Pi -r P 2 , Yi + Y 2 )* 

Xx = (Xotj, Xy-i). 

In the chosen coordinate system the orthogonal projection of a 
vector onto the axis u coincides with its coordinate projection onto 
the axis of abscissas. As already noted earlier, the first coordinate 
of the projection of any vector onto the axis of abscissas coincides 




25] 


Some prohlems 


87 


with the first coordinate of the vector, the other coordinates equalling 
zero. Therefore 


pr u (x + y) = (ai + a 2 , 0, 0), 
pr u (Xx) = (Xa,, 0, 0), 
pr u x = (a lt 0, 0), 
pr u y = (a 2 , 0, 0). 


(25.10) 


According to the rules of vector addition and of multiplication of 
vectors by a number, it follows from the last two equations of (25.10) 
that 


pr u x -f- pr u y = (a x + a 2 , 0, 0), 

\ pr u x = (Xa x , 0, 0). 

Comparing the right-hand sides of these with the right-hand sides of 
the first two equations of (25.10) we see that both properties of 
(25.9) are true. 

Now let n be some plane in space and let AB be a directed line 
segment. On dropping from the points A and B perpendiculars to n 
we obtain in n two points A n and 5 n which determine a directed 

line segment A n B a . This is called the orthogonal projection of AB 
onto n. The same notation is used to denote it, i.e. 


AaBn — pr,-, AB. 

Of course, for orthogonal projections onto the same plane we have 
relations similar to (25.9). To prove this we may fix some Cartesian 
system where n is a coordinate plane and use again the corresponding 
properties of projections onto a coordinate plane. 

We have discussed orthogonal projections of vectors in space. 
Undoubtedly a close analogy holds for vectors in the plane. 


Exercises 

1. Two nonzero vectors are given by their Cartesian 
coordinates. When are they perpendicular? 

2. Find the coordinates of the centre of gravity of three particles, given their 
Cartesian coordinates and masses. 

3. Find the area of a triangle, given the Cartesian coordinates of its three 
vertices. 

4. Let x, a, b and c he nonzero vectors in space, with a, b and c mutually 
perpendicular. Prove that 

cos 2 {x, a) + cos 2 [x, b ) + cos 2 {x, c) =1. 

5. Denote hy n any coordinate plane, and hy u denote any coordinate axis 
in n. Prove that for any vector x 

pr„ (pr„ x) = pr u x. 
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26. Scalar product 

The use of directed line segments to repre¬ 
sent forces and displacements leads to a very important notion of a 
scalar product of vectors. 

It is known from physics that if a vector a represents a force whose 
point of application moves from the initial point of a vector b to its 
terminal point, then the work to of such a force is defined by the 
equation 

© = | a | | b | cos {a, b}. (26.1) 

The right-hand side of the equation is called the scalar product of 
a and b and is generally designated (a, b). So 

(a, b) = J a | j b | cos (a, fe}. (26.2) 

Strictly speaking, this definition of a scalar product refers only 
to nonzero vectors a and b, since it is only for such vectors that an 
angle is defined. Taking into account the above interpretation of the 
scalar product, however, it is easy to see that it must be defined 
when at least one of the vectors is zero. If either a force or a displace¬ 
ment is given by a zero vector, then the work done equals zero. It 
will be assumed therefore that (a, b) = 0 if at least one of the vectors 
a and b is zero. 

Formula (26.2) yields some geometrical properties of a scalar 
product. For example, the angle between two nonzero vectors is 
acute (obtuse) if and only if the scalar product of the vectors is posi¬ 
tive (negative). 

If the angle between vectors on a straight line or at least one of 
the vectors is zero, then the scalar product of the vectors equals 
zero. Such vectors are called orthogonal. 

Orthogonal vectors of unit length are called orthonormal vectors. 
In particular, orthonormal vectors are the basis vectors i, j and k 
of a Cartesian system. It follows from formula (26.2) that 

(i, i) = 1, (i, j) = 0, (t, k) = 0, 

(/, 0 = 0, (/, /') = 1, (;, k) = 0, (26.3) 

(k, i) = 0, (A:, /) = 0, {k, k) = 1. 

Consider nonzero vectors a and b. Draw through a an axis u, 

specifying on it a direction such that the magnitude of a is positive. 
It is then obvious that 

{pr u b} = | b | cos {a, b }. 

The projection of b onto the axis, which is thus constructed, is called 
the projection of the vector b onto the vector a and designated pr Q b. 
Of course, the projection of one vector onto the other preserves prop- 
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erties (25.9). In new symbols 

(a, ft) = | a | {pr„ 6} = | ft | {pr 6 a). (26.4) 

These formulas establish very important algebraic properties of 
a scalar product. Namely, for any vectors a, ft and c and any real 
number a 

(1) (a, b) = (ft, a), 

(2) (aa, ft) = a (a, ft), 

(3) (a + ft, c) = (a, c) + (ft, c), 

(4) (a, a) > 0 for a =£ 0; (0, 0) = 0. 

Notice that relations (26.5) clearly hold if at least one of the 
vectors is zero. In the general case Properties 1 and 4 follow imme¬ 
diately from (26.2). To establish Properties 2 and 3 we use formulas 
(26.4) and the properties of projections. We have 

(cwz, ft) = | ft | {pr 6 (aa)} = | ft | (a-pr 6 a) 

= a | ft | {pr 6 a) = a (a, ft),. 

(a + 6, c) = | c | {pr c (a + ft)} = | c J {pr c a - pr c ft} 

= \e | {pr 0 a) + | c \ {pr c ft} - (a, c) - (ft, c). 

Properties 2 and 3 are associated with the first factor of a scalar 
product only. Similar properties hold for the second factor. Indeed,. 

(a, a ft) = (a ft, a) = a (ft, a) = a (a, ft), 

(a, ft + c) = (ft + c, a) = (6, a) + (c, a) = (a, ft) + (a, c). 

In addition, by virtue of the equation a — ft = a (—1) ft 
(a — ft, c) = (a, c) — (ft, c), 

(a, ft — c) = (a, ft) — (a, c), 

since 

(a — ft, c) = (a + (—1) ft, c) = (a, c) + ((—1) ft, c) 

= (a, c ) + (—1) (b, c) = (a, c) — (ft, c). 

Theorem 26.1. If two vectors a and ft are given by their Cartesian 
coordinates, then the scalar product of those vectors is equal to the sum 
of pairwise products of the corresponding coordinates. 

Proof. Suppose for definiteness that the vectors are given in space,. 

i.e. a = (Xj, $/,, Zj) and ft = (x 2 , y 2 , z 2 ). Since 
a = Xji + yj -(- Zjfc, 
ft = x 2 i + yj + z 2 k. 
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carrying out algebraic transformations of the scalar product we find 


(a, b) = x x x 2 (i, i) + x x y 2 (i, j) + x x z 2 (i, k) -f y x x 2 (j, i) 

+ (/, 7) + !/iZ 2 (/, k) -f z x x 2 (k, i) 

+ z 2 1 / 2 (k, j) -f z x z 2 (7c, k). 


Now by (26.3) we have 


(a, b) = x x x 2 + y x y 2 -l 2 l z 2 (26.6) 


and the theorem is proved. 

Formula (26.6) allows expressions (25.4) and (25.5) obtained 
earlier for the length of a vector and the angle between vectors to be 
written in terms of a scalar product. 

Namely, 


I a I = (a, a) i/2 , 


cos{a. b } = 


(a. <>) 

1 a |.| 6 | ■ 


(26.7) 


It may seem that these formulas are trivial, since they follow 
immediately from (26.2), without any reference to formulas (25.4) 
and (25.5). We shall not jump at a conclusion but note a very impor¬ 
tant fact. 

Notice that as a matter of fact our investigation proceeded in 
three steps. Using (26.2) we first proved that properties (26.5) are 
true. Then relying only on these properties and the orthonormality 
of the basis vectors of a coordinate system we established formula 
(26.6). And finally we obtained formulas (26.7) by using formulas 

(25.4) and (25.5) which were derived without using the concept of 
a scalar product of vectors. 

Starting from this we could now introduce the scalar product 
not by giving its explicit form but axiomatically , as some numerical 
function defined for every pair of vectors, requiring that properties 

(26.5) should necessarily hold. Relation (26.6) would then hold for 
any coordinate systems where the basis vectors are orthonormal in 
the sense of the axiomatic scalar product. Consequently, bearing 
in mind the model of a Cartesian system we could axiomatically 
assume that the lengths of vectors and the angles between them are 
calculated from formulas (26.7). It would be necessary of course to 
show that the lengths and angles introduced in this way possess the 
necessary properties. 


Exercises 

1. Given two vectors a and b, under what conditions on 
the number a are the vectors a and 6 + aa orthogonal? What is the geometrical 
interpretation of the problem? 
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2. Let a be a vector given in V 3 by its Cartesian coordinates. Find two linear¬ 
ly independent vectors orthogonal to a. 

3. Let a and b be linearly independent vectors given in V 3 by their Cartesian 
coordinates. Find a nonzero vector orthogonal to both vectors. 

4. What is the locus of vectors orthogonal to a given vector? 


27. Euclidean space 

Abstract vector spaces studied earlier are in 
a sense poorer in concepts and properties than spaces of directed line 
segments, first of all because they do not reflect the most important 
facts associated with measurement of lengths, angles, areas, volumes 
and so on. Metric notions can be extended to abstract vector spaces 
in different ways. The most efficient method of specifying measure¬ 
ments is through axiomatic introduction of a scalar product of vectors. 
We shall begin our studies with real vector spaces. 

A real vector space E is said to be Euclidean if every pair of vectors 
x and y in E is assigned a real number (x, y) called a scalar product, 
with the following axioms holding: 

( 1 ) (x, y) = (y, x), 

(2) (Ax, y) = A (x, i /), 

(3) (x + y, z) = (x, z) + (y, z), 1 

(4) (x, x) > 0 for x =£ 0; (0, 0) = 0 

for arbitrary vectors x, y and z in E and an arbitrary real number A. 

As is already known, it follows from these axioms that we can 
carry out formal algebraic transformations using a scalar product, 
i.e. 

(2 2 P j'jj) = 2 2 a iP?( j i. yj) 

1=1 ;= 1 1 = 1 >=l 

for any vectors xj and yj, any numbers at and P; and any numbers 
r and s of summands. 

Any linear subspace L of a Euclidean space E is a Euclidean space 
with the scalar product introduced in E. 

It is easy to show a general method of introducing a scalar product 
in an arbitrary real space K. Let e u e 2 , . . ., e n be some basis of K. 
Take two arbitrary vectors x and y in K and suppose that 

X = 5 l^i “I - i 2^ 2 -f" • • • "f" in^ni 
y = 7)1^1 + t) 2 e 2 r\ n e n . 

A scalar product of vectors can now be introduced, for example, as 
follows: 


(X, y) = SiTh + | 2 t] 2 + .. • + £„lln- 


( 27 . 2 ) 
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It is not hard to check that all the axioms hold. Therefore the vector 
space K with scalar product (27.2) is Euclidean. 

Notice that a scalar product can be introduced in K in other ways. 
For example, a scalar product in K is the following expression: 

Or, y) = a.Epi! + a 2 lj q 2 + a n | n T)„ 

for any fixed positive numbers a,, a 2 , . . ., a„. We should not be 
confused by this lack of uniqueness. For is there anything strange 
in the fact that lengths can be measured in metres or inches, angles 
can be measured in degrees or radians and so on? It is this lack of 
uniqueness that makes it possible to take the fullest account of the 
properties of particular spaces when introducing a scalar product 
in them. 

Introducing a scalar product in spaces of directed line segments 
we had to define it separately when at least one of the segments was 
zero. The scalar product was assumed to be zero. Now this fact is 
a property arising from axioms (27.1). If x is an arbitrary vector 
in E, then 

(0, x) = (Ox, x) = 0 (x, x) = 0. 

Of course, by the first axiom of (27.1) (x, 0) = 0. 

A vector x of a Euclidean space is said to be norrned if (a, x) =- 1. 
Any nonzero vector y can be norrned by multiplying it by some num¬ 
ber X. Indeed, under the hypothesis 

(M /, W = W (y, y) = l. 

and therefore as a noimalization factor we may lake 

* - (y, y)- l/t - 

A system of vectors is said to be norrned if all its vectors are norrned* 
It follows from the foregoing that any system of nonzero vectors 
can be norrned. 

One of the most important properties of a scalar product is stated 
by the following 

Theorem 27.1 (the Cauohy-Buniakowski-Schwarz inequality). 

For any two vectors x and y of a Euclidean space 

Or, y) 2 < Or, *) (y, y). 

Proof. The theorem is clearly true if y = 0, and therefore we 
assume that y 0. Consider a vector x — Xy, where X is an arbi¬ 
trary real number. We have 

(x — Xy,x — Xy) = (x, x) — 2X (x, y) -f X 2 (y, y). 

The left-hand side of the equation contains a scalar pioduct of 
equal vectors. Theiefoie the sectnd-degiee tiinoniial at the right 
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is non negative for any X, in particular for 


Tims 


X 


(*' y) 

(!/. It) • 


(x ’ x)-2 ¥rl' (x ’ y) 


(*. y ) 1 


(.y, y) 


■.(</. y)=-(x, x) 


(27.3) 


(*. y)* 
( y , y) 


>0 , 


which proves the theorem. 

As in the case of spaces of directed line segments two vectors x 
and y of any vector space are said to be collinear if either x = Xy 
or y — pa; for some numbers X and p. From 0 = Ox we conclude that 
two vectors are obviously collinear if there is at least one zero vector 
among them. A very convenient means of testing vectors for col- 
linearity is the Cauchy-Buniakowski-Schwarz inequality. Namely, 
we have 

Theorem 27.2. The Cauchy-Buniakowski-Schwarz inequality becomes 
an equation if and only if vectors x and y are collinear. 

Proof. Let x and y be collinear. Suppose for definiteness that 
x = Xy. We find 

(x, i/)* = (Xy, y) 2 = X 2 (y, y) 2 , 

(x, x) (y, y) = (Xy, Xy) (y, y) = X 2 (y, y) 2 . 

Comparing these equations shows that the sufficiency of the state¬ 
ment of the theorem holds. 

Suppose now that for some vectors x and y 

(x, y) 2 = (x, x) (y, y). (27.4) 


If y = 0, then the vectors are collinear. If y 0, then taking X 
according to (27.3) and considering (27.4) we get 

(x — Xy, x — Xy) = 0. 

By virtue of the last of the axioms (27.1) this means that x — 
— Xy = 0 or x = Xy, i.e. that x and y are collinear. Necessity also 
holds. 

As an example consider a space R„. It can be made Euclidean if 
for the vectors 


x = (aj, a 2 , . . ., a„), 
!/ — (Pli P 2 * • • •> Pn) 
a scalar product is introduced as follows: 

71 

(*. y) = 2 a.P«- 

i=i 


(27.5) 
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It is obvious that axioms (27.1) hold. The Cauchy-Buniakowski- 
Schwarz inequality then means that 

(2 «.Pi) 2 <(2 «?) (2 P?) (27.6) 

i=i i=i i=i 

for any real numbers a,- and {5,. 

Exercises 

1. Introduce a scalar product in a space of polynomials 
with real coefficients in a single variable. 

2. Will a space R n be Euclidean if the scalar product is introduced in it as 
follows: 

« 

0% y)= 2 I a t I I P* l ? 

i=i 

3. What is the geometrical meaning of the Cauchy-Buniakowski-Schwarz 
inequality in spaces of directed line segments? 

4. Prove that x = y if and only if ( x , d) = (y, d ) for every vector d. 

28. Orthogonality 

The most important relation between the 
vectors of a Euclidean space is orthogonality. 

By definition vectors x and y are said to be orthogonal if (x, y) = 0. 
By the first axiom of (27.1) the orthogonality relation of two vectors 
is symmetrical. In fact in a space of directed line segments the con¬ 
cept of orthogonality coincides with that of perpendicularity. 
Orthogonality may therefore be regarded as an extension of the 
notion of perpendicularity to abstract Euclidean spaces. 

A system of vectors of a Euclidean space is said to be orthogonal 
if either it consists of a single vector or its vectors are mutually 
orthogonal. If an orthogonal system consists of nonzero vectors, 
then it can be normed. A normed orthogonal system is called ortho¬ 
normal. 

Interest in orthogonal and orthonormal systems is due to the 
advantages they oSer in investigating Euclidean spaces. 

Thus, for example, any orthogonal system of nonzero vectors and 
of course any orthonormal system is linearly independent. Indeed, 
let a system x lt x 2 , . . ., x k be orthogonal and let 0 for every i. 
This means that (xj, x;) = 0 for i ^t= ;, but that (xj, xj) ^ 0 for 
i = We write 

ociXj + asXj + . . . -r ct k x k = 0. 

On performing a scalar multiplication of this equation by any 
vector xj we find 

«i (xi, Xj) + a 2 (x it x 2 ) + . . . + a k (x,, x h ) = 0. 
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Consequently, 

<xi ( xi , xi) = 0 (28.1) 

and of course a t = 0. Thus the system of vectors x u x 2 , . . x h 
is linearly independent. 

From (28.1) we deduce, in particular, that if a sum of mutually 
orthogonal vectors is zero, then all the vectors are zero. 

Especially many useful consequences arise from the assumption 
that some orthonormal system e u e 2 , . ■ ., e„ may form a basis of 
a Euclidean space E. In this case every vector x in E must be 
uniquely represented as a linear combination 

x = aie, + a 2 e 2 + • • • + a s e a . 

But on performing a scalar multiplication of this equation by ej 
we obtain an explicit expression for the coefficients of the expansion 
with respect to a basis. Namely, 

a, = (x, e t ). (28.2) 


If for the other vector, y, 

y = Pi*i + 02*2 + . . . + 0 3 e s , 

then on carrying out simple transformations we find that 

(x, y) = a,P! + a 2 p 2 + . . . + a s p s . (28.3) 

In particular, 

(x, x) = al + a* +. . . + a?. (28.4) 

Before going on with such studies we shall see if there is a basis 
consisting of orthonormal vectors. 

A basis whose vectors form an orthonormal system is called ortho- 
normal. The existence of such a basis in a Euclidean space is proved by 
Theorem 28.1. There is an orthonormal basis in any finite dimensional 
Euclidean space E. 

Proof. Let dim E = n. An orthonormal system is linearly inde¬ 
pendent and therefore it cannot contain more than n vectors. Sup¬ 
pose a system e lt e it . . ., e„ contains a maximum number of ortho- 
normal vectors. This means that in E there is no nonzero vector 
orthogonal to all vectors e lt e 2 , . . ., e a . If some vector is orthogonal 
to these, then it must be zero. 

Take an arbitrary vector x in E. If the orthonormal system e lt 
e 2 , . . ., e 3 were a basis, then the vector x would have to coincide 
with a vector y, where 

y = (x, eO Cj + (x, e 2 ) e 2 + . . . + (x, e,) e,. 
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Therefore consider the vector x — y. We have 

$ 

— ef) =(x— 2 (x, e p )e p , e t ) 

P=1 

5 

= (*, * ( )—£ (*. e p )(e p , * ( )=(x, e t )—(x, e t )=0. 

p-1 

The vector x — y turns out to be orthogonal to all vectors e u e 2 , . . . 
. . e s . Consequently, x — y = 0 or x = y. 

So the linearly independent system e lt e 2 , . . ., e s possesses the 
property that any vector of E is linearly expressible in terms of its 
'vectors, i.e. it forms a basis. 

Corollary. Any orthonormal system of vectors e b e 2 , . . ., e h may 
■be supplemented to an orthonormal basis. 

Indeed, choose among the orthonormal systems containing a given 
system the one that has a maximum number of vectors. Let it be 
o system e y , . . ., e h , e h + l , . . ., e s . Repeating then word for word 
the proof of Theorem 28.1 we establish that the new system is a basis. 

Besides orthogonal vectors in a Euclidean space we shall discuss 
orthogonal sets of vectors. Two sets F and G of vectors of E are said 
to be orthogonal if every vector in F is orthogonal to every vector 
in G. This is designated F A. G. 

Of course a set may consist of a single vector. If some vector of 
« set is orthogonal to the entire set, then it is, in particular, orthog¬ 
onal to itself. Consequently, it may be only zero. 

Lemma 28.1. For a vector x to be orthogonal to a subspace L it is 
necessary and sufficient that it should be orthogonal to each vector of 
some basis of L. 

Proof. Fix a basis y lt y 2 , . . ., y h of a subspace L. If x _]_ L, 
then x is orthogonal to every vector in L and in particular to y lt 
y 2 , . . ., y h . Now let (x, yt) = 0 for every i. Take an arbitrary 
■vector z in L and expand it with respect to the basis vectors. If 

z = ad/i + a 2 i / 2 -f . . . + a k y k 
for some numbers ai, a 2 , . . ., a*,, then 


< x , z) = (x, a&i + a 2 y 2 + . . . + a h y h ) 

= (x, J/i) + a 2 (x, y 2 ) - f . . . + a* (x, y h ) = 0. 


This means that x L. 

Corollary. For two subspaces to be orthogonal it is necessary and 
sufficient that each vector of some basis of one of the subspaces should 
be orthogonal to each vector of some basis of the otker. 
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The sum K of linear subspaces L lt L 2 , . . L m is said to be 
orthogonal if the subspaces are mutually orthogonal. To denote 
an orthogonal sum the following notation will be used: 

K — L\ © L/% ® • . • © L m . 

Lemma 28.2. An orthogonal sum of nonzero subspaces is always 
a direct sum. 

Proof. Choose in each subspace an orthonormal basis and consider 
the system of vectors which is the union of bases of all subspaces. 
It is clear that each vector of the orthogonal system is linearly expres¬ 
sible in terms of the vectors of the system constructed. But this is 
linearly independent since it consists of nonzero mutually orthog¬ 
onal vectors. Now the lemma follows from Theorem 20.1. 

Let a Euclidean space K be represented as an orthogonal sum of 
its subspaces L u L t , . . ., L m . Then the collection of these sub¬ 
spaces may be regarded as a generalized orthogonal basis. In partic¬ 
ular, if for any vectors x and y in K we write their expansions with 

respect to the subspaces L x , L 2 , . . ., L m , i.e. represent them as 

x = -|- Xj d - . . . "d* x m , 

y = i/i + yj + • • • + ymi 

where x t , yt 6 L it then it is easy to establish that 

(x, y) = (x lt 2/0 + (x 2 , j/j) + . • • + (x m , y m ). (28.5) 

This formula is similar to formula (28.3). 

Consider an arbitrary nonempty set F of vectors of a Euclidean 
space E. The collection of all vectors orthogonal to F is called the 
orthogonal complement of F and designated F 1 . The orthogonal com¬ 
plement is a subspace. Indeed, if vectors x, y 6 F 1 , then x, y JL F. 
But then ax + Py _L F for any numbers a and p, i.e. ax + Py 6 F L . 

Theorem 28.2. A Euclidean space E is the orthogonal sum of any 
of its linear subspaces, L, and its orthogonal complement L l , i.e. 

E = Z, 0 L-l, 

Proof. Let dim L = s, dim L 1 = m. Choose some orthonormal 
basis e,, .... e, of a subspace L and some orthonormal basis r lt . . . 
. . ., r m of L 1 . The system of vectors e lt . . ., e„ r u . . ., r m is 
orthonormal and hence linearly independent. 

If the system is not a basis of E, then it can be supplemented to 
an orthonormal basis of E. Let e be one of the complementary vectors. 
It is orthogonal to the vectors e„ . . ., e, and therefore ej_ L, 
i.e. e 6 L l . But, on the other hand, e is orthogonal to r lf . . ., r,„ 
and therefore e 1 L 1 . So e is both in L L and orthogonal to L 1 . 
Consequently, e = 0, which proves the theorem. 


7-0510 
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Decomposing a space as an orthogonal sum of its subspaces allows 
many studies to be efficiently carried out. The following example 
will serve as an illustration. 

Consider a Euclidean space E with some fixed system of vectors 
X,, x 2 , . . ., x k . If the rank of that system equals the dimension 
of E, then it is obvious that the only vector in E orthogonal to all 
vectors of the given system is the zero vector. We have the converse 

Lemma 28.3. If in a Euclidean space E some system of vectors x lt 
x 2 , . . ., x/, is given and the only vector in E orthogonal to those vectors 
is the zero vector, then the rank of the system equals the dimension of E. 

Proof. Denote by L the span of the system of vectors x lf x 2 , . . . 
. . ., x h . Any vector orthogonal to these vectors is orthogonal to L , 
i.e. is in the orthogonal complement L 1 . According to the hypothesis 
of the lemma L L consists of the zero vector only. Since E = L © L 1 , 
it follows that the dimension of L coincides with that of E. But the 
dimension of L equals the rank of the system of vectors x lt x 2 , . . . 
. . ., xj,. Thus the lemma is proved. 

Exercises 

1. Prove that if the scalar product of any two vectors 
of a Euclidean space is expressed by equation (28.3), then the basis relative 
to which the coordinates are taken is orthonormal. 

2. Prove that if the scalar product of any vector of a Euclidean space by 
itself is expressed by equation (28.4), then the basis relative to which the coordi* 
nates are taken is orthonormal. 

3. Prove that if two sets consisting of a finite number of vectors are orthog¬ 
onal, then so are the spans constructed on those sets. 

4. Prove that the intersection of two orthogonal subspaces consists but of 
a zero vector. 

5. Prove that if a Euclidean space is the direct sum of its subspaces and 
(28.5) holds for any two vectors, then the subspaces are mutually orthogonal. 

6 . Prove that for any subspaces L and M of a Euclidean space E 

dim L + dim L i = dim E, 

(H)i= L, 

( L + M) i = L 1 (1 <Wi, 

(L f) M) i= Li + AH. 

29. Lengths, angles, distances 

We now extend to the elements of a Euclidean 
space such notions as length, angle and distance. We shall proceed 
from the analogy with spaces of directed line segments. 

The length \ x \ of a vector x of a Euclidean space E is 

| x | = +(x, x) 1 /*. 

Every vector has length. According to the last axiom of (27.1) 
it is positive for nonzero vectors and equal to zero for a zero vector. 
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Also the equation 

| Xx | = (Xx, Xx) 1 /* = (X* (x, x)) 1 / 2 = | X | j x | 

shows that it is possible to take the absolute value of the numerical 
factor X outside the vector-length sign. As already noted a nonzero 
vector can be normed, i.e. multiplied by a number such that the 
length of the resulting vector is equal to unity. 

The angle {x, y) between nonzero vectors x and y of a Euclidean 
space E is the angle defined by the relations 

COS {x, Ij) ^ | | , 0<{x, !/}<*. 

If there is at least one zero vector among the vectors x and y, then 
the angle between such vectors is said to be undefined. 

The Cauchy-Buniakowski-Schwarz inequality allows us to state 
that the expression which we have called the cosine of the angle 
between vectors does not exceed unity in absolute value. Therefore 
the angle between any nonzero vectors is always uniquely defined. 
It remains unchanged under multiplication of the vectors by any 
positive numbers and by Theorem 27.2 equals 0 or Jt if and only if 
the nonzero vectors are collinear. All this is in full agreement with 
the concept of the angle between directed line segments. 

Take tw-o nonzero vectors x and y. Bearing in mind the analogy 
w'ith directed line segments we shall assume them to be two sides 
of some triangle. It is natural to take the vector x — y as its third 
side. Using the definition of the length of a vector and of the angle 
between vectors w'e find 

I x — V I* = (x — y. x — y) = (x, x) — 2 (x, y) + (y, y) 

= I x |* + | y |* — 2 | x | | y | cos {x, j/}. (29.1) 

So w'e have show'n that in a Euclidean space the square of the 
length of any side of a triangle is equal to the sum of the squares 
of the lengths of its two other sides minus the doubled product of 
the lengths of those sides by the cosine of the angle between them. 

If it is a right triangle, i.e. if the angle between the vectors x and y 
is a right one, then obviously 

I * — y |* = | x |* + | y |*. (29.2) 

This is nothing than a formal expression of the well-known Pythag¬ 
orean theorem. 

Consider again an arbitrary triangle. Since the cosine of the angle 
between vectors does not exceed unity in absolute value, it follows 
from (29.1) that 

|x-y |*<(|x| + | y |)*, 

\x-y PXIxl— \y I)* 

7* 
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\x — y | < |x | + | y |» 

\ * — y \ > \\x \ — \y\\. 


(29.3) 


Thus in a Euclidean space the length of a triangle side does not 
exceed the sum of the lengths of the other two sides but is not less 
than the diSerence of their lengths. 

The distance p (x, y) between vectors x and y of a Euclidean space 
is the quantity 

P (*. y) = I x — y |. (29.4) 


It satisfies the three natural properties of distances between vectors 
(in point interpretation!) in spaces of directed line segments. Namely, 
for any vectors x, y and z of a Euclidean space 

(1) P ( x . y) = p (y, x), 

(2) P (*a y) > 0 if x y t p (x, y) = 0 if x = y, (29.5) 

(3) p (*. y)< P (*, z) + p (z, y). 


The first two properties are obvious. The last one is nothing than 
a generalization of the well-known “triangle inequality”. It follows 
from the first of the inequalities (29.3) if we replace x by x — z 
and y by y — z. 

The distance p (A, B) between sets A and B of vectors of the same 
space is the quantity 

p(A, B)= inf p(x, y). 

x£A. yEB 

In conclusion note the following fact. Let e lt e 2 , . . e, be an 
orthonormal basis fixed in a Euclidean space E. For any two vectors 
x and y given by their coordinates 

x — (ccj., ocj, • . w,), y — (Pi, P 2 , • • •, P,) 


relative to that basis we have by (28.3) 

I * I =(«!+ <*! + • • • +a«) l/2 - 

Consequently, 


cos {x, y}= 


g iPi + »aPa+ • • • 


Close analogy with formulas (25.4) and (25.5) is obvious. 

Thus the concepts of length, angle and distance we have intro¬ 
duced fully agree with similar notions in spaces of directed line 
segments. 
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Exercises 

1. Prove that the length of the sum of any number of 
vectors does not exceed the sum of the lengths of those vectors. 

2. Prove that the square of the length of the sum of any number of orthog¬ 
onal vectors is equal to the sum of the squares of the lengths of those vecton. 

3. Given a Euclidean space of polynomials in a single variable t, find the 
angles of a triangle formed ny the vectors 1, t 2 and 1 — t 2 . 

4. What is the distance between the polynomials 3 1 2 6 and 2t 3 -|- t + 1? 

5. Prove that a triangle in a Euclidean space is a right triangle if and only 
if the length of one of its sides is equal to the product of the length of another 
side of the cosine of the angle between them. 


30. Inclined line, perpendicular, 
projection 


Before extending to abstract Euclidean spaces 
the concepts of inclined line, perpendicular and projection we con¬ 
sider these notions in the space of directed line segments. 

Let L be a plane. Drop to it from some point M a perpendicular 
and denote its foot by M L (Fig. 30.1). To give this problem a vector 
interpretation choose on L a point 0 
and consider a space V a of directed 
line segments fixed at 0. The plane L 
forms a subspace. Therefore the con¬ 
struction of the perpendicular dropped 
from M to L reduces to decomposing the 

vector OM of the space as the sum 

OM = OM l + WJt, (30.1) 



where OM L 6 L and M L M _L L. From 
geometrical considerations it is clear 

that decomposition (30.1) always exists and is unique. 

This example suggests how to set the problem of a perpendicular 
in the general case. Suppose in a Euclidean space E some subspace L 
is fixed. Take an arbitrary vector / in E and study the possibility 
of decomposing it as a sum 

/ = g -f h, (30.2) 


where g 6 L and h J_ L. 

We have already encountered this problem. Indeed, the condition 
h _L L is equivalent to h £ L 1 . By Theorem 28.2 a Euclidean space E 
is the direct sum of subspaces L and L L . Therefore decomposition 
(30.2) always exists and is unique. 

Bearing in mind the analogy with decomposition (30.1) the vector g 
in (30.2) will be called the projection of the vector f onto L, h will be 
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called the perpendicular from f to L , and / will be called the inclined 
line to L. 

In elementary geometry the length of a perpendicular is known 
never to exceed that of an inclined line. A similar situation occurs 
in a Euclidean space. The vectors g and h in decomposition (30.2) 
are orthogonal. By the Pythagorean theorem therefore 

I / I 2 = \g I 2 + I h | 2 , 

and so 

I h | < | / |. 


It is clear that the length of the perpendicular h to a subspace L 
is equal to the length of an inclined line / to the same subspace if 
and only if f ±_L. 

The problem of a perpendicular may be given another interpreta¬ 
tion. Consider again an arbitrary vector / in E. It is not necessarily 
in L. Consequently, we may require to find in L a vector such that 
is closest to / in the sense of the earlier introduced distance. 

Take an arbitrary vector z in L. Subtracting it from both sides 
of (30.2) we get 

/ — z = (g — z) + h. 

Since h is orthogonal to g — z, by the Pythagorean theorem 
| / — z |* = | g — z |* + | h |. 

Therefore 

I / — z | > | A |, 


equality holding if and only if z = g. 

So of all the vectors in L the projection of / onto L is closest to /. 
This means that 

P (/. L) = p (/. g). 


By analogy with directed line segments we can say that the angle 
between f and L is the smallest of the angles between / and the vectors 
z in L. Taking into account the Cauchy-Buniakowski-Schwarz in¬ 
equality and decomposition (30.2) we find 


cos{/, z) = 


(/. ;) 

I / I I * I 


(g + h, t) _ (g, z) ^|g| 

1 / 11*1 \!\ \z\^\f\' 


It is obvious that this inequality becomes an equation if and only 
if z makes a zero angle with g. 

Thus the angle between / and L coincides with that between / 
and its projection onto L. 

The above properties of a perpendicular and a projection reflect 
the geometrical aspect of these concepts. Now we shall consider 
them from an algebraic point of view. Given a fixed subspace L, 
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each vector / of a Euclidean space E uniquely determines relative 
to L two of its components. Consequently, it may be assumed that 
decomposition (30.2) gives two functions 


g = P r L U 
h =s ort L fm 


The “independent variable” of a function may be any vector from 
E, the “value” of the function pr L / being a vector from L and the 
"value” of ort^ / a vector from L 1 . 

By virtue of (L 1 ) 1 = L the perpendicular and the projection 
are related by the following equations: 


pr L / = ort Ll /, 
ort L f = pr tX /. 


(30.3) 


Therefore in fact the study of these functions always reduces to the 
study of one of them. 

Take two arbitrary vectors x and y in E. According to (30.2) 


x = pr L x + ort L x, 
V = pr t y + ort L y. 


(30.4) 


Adding termwise these equations and multiplying the first of them 
by an arbitrary real number X yields 

x + y — (pr L x + pr L y) + (ort t x + ortx. y), 

Xx = (X pr L x) + (k ort L x). 


A straightforward check shows that the vectors in the first parenthe¬ 
ses are from L and those in the second parentheses are perpendicular 
to L. According to the uniqueness of decomposition of the type (30.2) 
this means that the relations 


pr L (x + y) = pr L x + pr L y, 
pr L (Xx) = X pr L x 


(30.5) 


hold for the function pr L and that of course the similar relations 


ortx, (x + y) — ort L x + ort L y , 
ort L (Xx) = X ort L x 


(30.6) 


hold for ort L . Formulas (25.9) and (30.5) fully coincide. 

Notice that ort L z = 0 for any vector z in L. Therefore it follows 
from the first of the equations (30.6) that 

ort L (x + z) = ort L x. 

Consequently, the value of the function ort L remains unchanged 
if we add to the independent variable any vector from L. In partic- 
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ular, if we take z = — pr L x, then, taking into account (30.4), 

ort L (ort L x) = ort L x. (30.7) 

A similar relation holds for a projection. Namely, 

pr L (pr L x) = pr L x. (30.8) 

Now let a subspace L be the orthogonal sum of subspaces L x and 
L % . Take an arbitrary vector x in E and represent it as the sum 

x = (prx., x ~ prj, x) + (x— pr*. x— pr L| x). 

The vector in the first parentheses is obviously in the subspace 
J-i\ © Z/ 2 . The vector in the second parentheses is orthogonal to 
L x © Z, 2 , which is easy to show by transforming it with the aid 
of (30.4) as follows: 

x — pr tl x — pr Ll x — ortx., x — pr^, x = ort Ll x — pr Ll x. (30.9) 
Therefore we conclude that 

pri^t. x = pr Ll x + pr t> x. 

The perpendicular from x to L, © L 2 is equal to one of the expres¬ 
sions (30.9). If in particular x _L L u then 

ort/. l( $ L , x = ortt, x. (30.10) 


Exercises 

1. Does the analogue of the theorem on three perpen¬ 
diculars hold in a Euclidean space? 

2. Prove that the sum of two angles between a vector / and subspaces L 
and L L is equal to n/2. 

3. Find tne perpendicular and projection of a vector / onto trivial subspaces. 

4. Prove that if for fixed subspaces L x and L t and any vector x 

P r L I + /, X =P r i. 1 *+P r L. 

then the sum L x + L. is orthogonal. 

5. Prove that if sunspaces L lt L t , .. ., L m are mutually orthogonal, then 
for any vector x in E 


I P t l. x I 2, 

i=i ' 

31. Euclidean isomorphism 

Carrying out our studies we have repeatedly 
noted the coincidence of the properties of an abstract Euclidean space 
and spaces of directed line segments. We could carry over to Euclid¬ 
ean space the other facts and theorems of elementary geometry. But 
there is no need for this. 
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We introduce the concept of Euclidean isomorphism. We shall 
say that Euclidean spaces E and E' are Euclidean isomorphic if 
they are isomorphic as real vector spaces and if in addition for any 
pair of vectors x and y in E and the corresponding vectors x' and y r 
in E' 

(x, y) = (x\ y'). 

Theorem 31.1. For two Euclidean spaces to be Euclidean isomorphic 
it is necessary and sufficient that they should be of equal dimension. 

Proof. If two Euclidean spaces E and E' are Euclidean isomorphic, 
then they are isomorphic as real vector spaces as well. But such 
vector spaces have the same dimension. 

Consider now two Euclidean spaces E and E' of the same dimen¬ 
sion n. Let e u e 2 , . . ., e n be an orthonormal basis in E and let e[, 
e 2 , . . ., e'-n be an orthonormal basis in E' . Assign to each vector 

x = ai^x + a 2 e 2 +...+ a n e n 

of E a vector 

x = + . ,-fa n en 

of E' . This correspondence was proved earlier to be an isomorphism- 
Now take another pair of corresponding vectors in E and E' 

y — PiCi + P** 2 + • •. +Pn^n* 
y' = Pl e i' + + • • • + Pn^n- 

By (28.3) 

(x, y) = a,p, + 0^2 + . + a n p n = (x', y'). 

Thus the theorem is proved. 

We are concerned throughout only with such properties of vector 
spaces that are consequences of the basic operations acting in spaces. 
From this point of view Euclidean isomorphic spaces have the same 
properties. Therefore any geometrical theorem proved in V 3 will 
be true in any three-dimensional subspace of a Euclidean space. 
Consequently, it will be also true in any Euclidean space. Of course, 
the arithmetical space R n with a scalar product introduced according 
to (27.2) may serve as a standard Euclidean space. 

Exercises 

1. Construct a Euclidean isomorphism from V 2 to R 2 . 

2. Prove that in Euclidean isomorphic spaces an orthonormal system of vec¬ 
tors goes over into an orthonormal system. 

3. Prove that in Euclidean isomorphic spaces the angles between pairs of 
corresponding vectors are equal. 

4. Prove that in Euclidean isomorphic spaces a perpendicular and a projec¬ 
tion go over respectively into a perpendicular and a projection. 
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32. Unitary spaces 

We have extended the basic metric concepts 
only to real vector spaces. Similar results hold for the complex vector 
space. 

A complex vector space U is said to be unitary if every pair of 
vectors x and y in U is assigned a complex number (x, y) called 
a scalar product, with the following axioms holding: 

(1) (x, y) = (yTx), 

(2) (Xx, y) = X (x, y), 

(3) (x + y, z) = (x, z) + (y . z), 

(4) (x, x) > 0 for x ^ 0; (0, 0) = 0 

for arbitrary vectors x, y and z in U and an arbitrary complex num¬ 
ber X. 

The bar in the first axiom signifies complex conjugation. This 
single distinction from the axioms of a Euclidean space involves 
no profound differences but we should not forget about it. Thus, while 
in a Euclidean space (x, Xy) = X (x, y), in a unitary space (x, Xy) =* 
= X (x, y). 

In a unitary space U we can introduce some metric concepts. The 
length of a vector, as in the real case, will be the quantity 

1 x | = +(x, x) 1 /*. 

Every nonzero vector has a positive length and the length of a zero 
vector is zero. For any complex X 

I Xx | = | X 1.1 x |. 

The Cauchy-Buniakowski-Schwarz inequality 
I (x, y) I 2 < ( x , x) (y, y) 

is also true. The proof is similar to that for the real case. 

The concept of the angle between vectors is not introduced in 
a unitary space as a rule. Only the case where vectors x and y are 
orthogonal is considered. As in the real case it is understood that 

(J. V) = 0. 

It is obvious'that (y, x) = (x, y) = 0. 

Essentially the entire theory of Euclidean spaces discussed above 
can be carried over without changes in the definitions and general 
schemes of the proofs to unitary spaces. 

The arithmetical space C n may serve as the standard unitary space 
if for the vectors 

X — (0Cj| a„ •• •• ®n) s 

y = (Pit P21 •• •* Pn) 
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their scalar product is introduced as follows: 

(*. y) = 2 (32.i) 

t=i 

Using this space it is easy to show the part played by complex con¬ 
jugation in the first axiom. If the scalar product were introduced 
in C„ according to formula (27.2), then in C 3 , for example, for the 
vector 

x = (3, 4, 5 i) 

we would have 

(x, x) = 9 + 16 -f- 25 1 2 = 0. 

The fourth axiom would be found to have failed. 

Exercises 

1. Compare a Euclidean space R, and a unitary 

space Ct. 

2. Write the Cauchy-Buniakowski-Schwarz inequality for a space C„. 

3. If in a complex space the scalar product is introduced according to axi¬ 
oms (27.1), can the Cauchy-Buniakowski-Schwarz inequality hold in such a space? 

4. If in a complex space the scalar product is introduced according to axi¬ 
oms (27.1), can there be an orthogonal basis in such a space? 

33. Linear dependence 

and orthonormal systems 

We have already noted in Section 52 that 
the linear independence of a system of basis vectors may be violated 
by a small change in the vectors. This phenomenon leads to great 
difficulties in the use of the concept of basis in solving practical 
problems. It is important to stress, however, that not all bases pos¬ 
sess so unpleasant a feature. In particular, it is lacking in any ortho¬ 
normal basis. 

Let e u e 2 , . . ., e„ be an arbitrary orthonormal basis chosen in 
a Euclidean or a unitary space. If for some vector b 

n 

6=2 a t e t> 
i=i 

then by (28.4) 

(6| 2 =2l« < | 2 . (33.1) 

<=i 

Consider now a system of vectors e x + e lf e 2 + e g , . . ., e n + e„ 
and suppose that it is linearly dependent. This means that there 
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are numbers p 1( p 2 , . . p n not all zero such that 

2 Pi ( e t + C|) = 0 

i=i 

It follows that 

n n 

2 P t e t — — 2 Pi e i- 

»=i /=i 

Using (33.1) and (27.6) we get 

2 ! Pi I 2 = 12 Pi«i| 2 = l2 Pieil 2 <(2 IPil I ®i I ) 2 

1=1 1=1 1=1 1=1 

<(2 llPil 2 )(2 IM 2 ). 
1=1 1=1 

Comparing the left- and right-hand sides of these relations we deduce 
that 

I e. l 2 >1- 

Thus the inequality obtained means that if the condition 

2 I e, P<1 (33.2) 

<=i 

holds, then the system of vectors 

®1» “I" Ej, • • «s e n + 

is clearly linearly independent. 

The indicated feature of orthonormal systems has determined their 
wide use in constructing various computational algorithms associated 
with expansions with respect to bases. 

Exercises 

1. Let e,, e : , be an orthogonal basis of a Euclid¬ 

ean space. Prove that a system of vectors x lf x t , . . ., x n is linearly indepen¬ 
dent if 

n 

2 cos tot *l)> n — 2 ". 

i=l 

2. Let vectors x t = (xn, x l2 , . . xi„), for i = 1, 2, . .., n, be given by 

their coordinates in an arbitrary basis. Prove that if 

I *ft l 2 > » J 1 Xik l 2 

for every l, then the system x lt x„ ..x„ is linearly independent. 
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34. Vector and triple scalar products 

Again we begin our studies with a space of 
directed line segments. As always we assume some fixed Cartesian 
coordinate system with origin 0 and basis i, / and k. 

Three vectors are said to be a triple if it is stated which vector 
is the first, which is the second and which is the third. In writing 
we shall arrange the vectors of a triple consecutively from left to 
right. 

A triple of noncoplanar vectors a, b and c is said to be right- 
(left-)handed if they are arranged the way the thumb, the unbent 
forefinger and the middle finger of the right ( left ) hand can be held 
respectively. 

Of any three noncoplanar vectors a, b and c it is possible to com¬ 
pose the following six triples: 

abc, bca, cab, bac, acb, cba. 

The first three are of the same sign as abc, and the others are of the 
opposite sign. Notice that interchanging any two vectors in any 
triple makes the triple change sign. 

An affine or Cartesian coordinate system is said to be right- (left-) 
handed if its basis vectors form a right- (left-)handed triple. So far 
our studies have been independent of the sign the basis of the coor¬ 
dinate system had. Now some diSerences will appear in our inves¬ 
tigations. For the sake of definiteness therefore in what follows we 
shall consider only right-handed coordinate systems. 

Suppose two noncollinear vectors a and b are given. We assign 
to them a third vector c satisfying the following conditions: 

(1) c is orthogonal to each of the vectors a and b, 

(2) abc is a right-handed triple, 

(3) the length of c equals the area S of the parallelogram construct¬ 
ed on the vectors a and b applied to a common origin. If a and b 
are collinear, then we assign to such a pair of vectors a zero vector. 

The resulting correspondence is an algebraic operation in a space V 3 . 
It is called vector multiplication of vectors a and b and designated 

9 * la, 6]. 



110 


The Volume of a System of Vectors 


[Ch. 1 


Consider the basis vectors i, j and k. By the definition of a vector 
product 

[i, i] = 0, [i, /] = k , [i, k) = — 

[/, t) = —k, If, f] = 0, [/, k) = i , (34.1) 

[/c, t] = /, [/c, /) = — i, [k, k] = 0. 

From these relations it follows in particular that the operation of 
vector product is noncommutative. 

Every triple abc of noncoplanar vectors applied to a common 
point 0 determines some parallelepiped. The point 0 is one of its 
vertices, and the vectors a, b and c are its edges. We designate the 
volume of that parallelepiped as V (a, b , c) thus emphasizing its 

dependence on the vectors a, b and c. If 
a triple a, b and c is coplanar, then the 
volume is assumed to be zero. We now assign 
to the volume a plus sign if the noncopla¬ 
nar triple abc is right-handed and a minus 
sign if it is left-handed. The new concept 
thus defined will be called the oriented 
volume of the parallelepiped and desig¬ 
nated V ± (a, b, c). 

A volume and an oriented volume may 
be regarded as some numerical functions of 
three independent vector variables assuming certain real values for 
every vector triple abc. A volume is always nonnegative, and an 
oriented volume may have any sign. We shall later see that 
separating these concepts does make sense. 

Let a, b and c be three arbitrary vectors. If we carry out vector 
multiplication of a on the right by b and then perform scalar multi¬ 
plication of the vector (a, 6] by c, then the resulting number 
((a, 6), c) is said to be a triple scalar product of the vectors a, b and c. 

Theorem 34.1. A triple scalar product ((a, 6), c) is equal to the 
oriented volume of the parallelepiped constructed on vectors a, b and c 
applied to a common origin. 

Proof. We may assume without loss of generality that a and b 
are noncollinear since otherwise [a, 6] = 0 and the statement of the 
theorem is obvious. Let S be as before the area of the parallelogram 
constructed on a and b. By (26.4) 

((a, 61, c) = | [a, 6] | {pr la>fcJ c] = S {prt a , fc | c }. (34.2) 

Suppose a, b and c are noncoplanar. Then {prf 0i 6j c) is up to 
a sign equal to the altitude h of the parallelepiped constructed on 
the vectors a, b and c applied to a common origin provided that the 
base of the parallelepiped is the parallelogram constructed on a 
and b (Fig. 34.1). Thus the right-hand side of (34.2) is up to a sign 
equal to the volume of the parallelepiped constructed on a , 6 and e. 





3-1] _ Vector and triple scalar products _ lit 

It is obvious that (pr[ a , b ] c) = -\-h if the vectors [a, 6) and c 
are on the same side of the plane determined by a and b. But in this- 
case the triple abc is also right-handed. Otherwise {pr[„ it) ]c} = —A. 
If the vectors abc are coplanar, then c is in the plane given by a 
and b, and therefore (pr^bj c) = 0. Thus the theorem is proved. 
Corollary. For any three vectors a, b and c 

(la, 6], c) = {a, [b, cl). (34.3) 

Indeed, from the symmetry of a scalar product it follows that 
(a, [b, cl) = ([b, cl, a), and therefore it suffices to show that 
([a, bj, c) = ([b, cl, a). But the last equation is obvious since the 
triples abc and bca are of the same sign and their parallelepipeds 
coincide. 

Relation (34.3) allows efficient algebraic studies to be carried out. 
We first prove that for any vectors a, b and c and any real number a 
the following properties of vector multiplication hold: 

(1) la, b] = —lb, a], 

(2) laa, b] = a la, 6), 

(3) la + b, cl = [a, cl + [b, cl, 

(4) la, a] — 0. 

Property 4 follows in an obvious way from the definition. To- 
prove the remaining properties we shall use the fact that vectors x 
and y are equal if and only if 

(x, d) = (i/, d) 

for any vector d. 

Let d be an arbitrary vector. The triples abd and bad are of opposite 
signs. Consequently, from Theorem 34.1 and the properties of a scalar 
product we conclude that 

([a, b], d) = —([b, a], d) = (—(b, a], d). 

Since d is an arbitrary vector, this means that la, b] = — [b, a\ 
and the first property is proved. 

To prove the second and the third property we proceed similarly 
but in addition take into account (34.3). We have 

([oca, bl, d) = (aa, [b, dl) = a (a, lb, d]) 

= a ([abl, d) = (a [a, bl, d), 
which shows the validity of Property 2. Further 
([a -|- b, cl, d) = (a + b, [c, dl) = (a, [c, dl) + (b, [c, dl) 

= (la, cl, d) + ([b, cl, d) = ([a, cl + [b, c], d)i 
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and Property 3 is also true. Similarly for the second factor 
la, ab ] = — [ab, a 1 = —a [6, a] = a [a, 6], 
la, b + c] = —[6 + c, a] = —( 6 , a] — [c, a] = [a, 61 + [a, cl. 

Now we can investigate the algebraic properties of an oriented 
volume as a function given on triples of vectors. For example, let a 
be a linear combination of some vectors a' and a". Then 

<[a a' + p a", 61, c) = (a [a', 6] -f p [a", 61, c) 

= a (la', 61, c) + p ([a", 6l, c). 

Consequently 

V* (ao' + pa', 6, c) =a7± (a', 6, c) + p7± (a', 6, c) 

for any vectors a' and a ' and any real numbers a and p. 

When two independent variables are interchanged, the oriented 
volume only changes sign and therefore a similar property for a linear 
combination holds in each independent variable. Bearing this in 
mind we shall say that an oriented volume is a linear function in 
each of the independent variables. 

If vectors abc are linearly dependent, then they are coplanar and 
therefore the oriented volume is equal to zero. Also, considering 
(34.1) we find that 

V ± (i, /, k) = ([/, /l, k) = (A:, k) = 1. 

So it may be concluded that as a function an oriented volume pos¬ 
sesses the following properties: 

(A) an oriented volume is a linear function in 
each of the independent variables, 

(B) an oriented volume is equal to zero on 

all linearly dependent systems, (34.4) 

(C) an oriented volume is equal to unity at least 
on one fixed orthonormal system of vectors. 

Of course we have formulated not all the properties by far of an 
oriented volume. Both the properties listed in (34.4) and others 
can easily be established if we know explicit expressions of vector 
and triple scalar products in terms of the coordinates of vectors 
a, 6 and c. 

Theorem 34.2. If vectors a and 6 are given by their Cartesian coor¬ 
dinates 

a = (x„ y lf Zj), 

6 = (Xj, J/ 2 , Z 2)» 

then the vector product will have the following coordinates: 

la, 6] — (yjz, — i/jz,, ZjX, — z,x lf x^, — x^). (34.5) 
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Proof. Considering that giving the coordinates of the vectors 
determines the decompositions 

a = x x i + Ui) + z x k, 
b = x 2 i + yj + z 2 k, 

and relying on the algebraic properties of a vector product we find 
[a, fcl = x x x 2 ft, ij + x x y 2 [/, /] + x x z 2 [i, £] + y x x 2 f/, i] 

+ UiU 2 [/. /I + 1/iZj f/, + z x x 2 [k, tl -f z x y 2 [/r, ;] + z x z 2 [k, k}. 

The statement of the theorem now follows from (34.1). 

Corollary. // a vector c is also given by coordinates x 3 , y 3 and z 3 
in the same Cartesian system, then 

(ffl, 6), c) — x x y 2 z 3 -4- x 2 y 3 z x -f- x 3 y x z 2 Zil/aZ 2 

x 2lh z 3 ■ r sJ/z z i‘ (34.6) 

The introduction of an oriented volume and the study of its algeb¬ 
raic properties allow us to make important conclusions concerning 
length, area and volume. 

Notice that right- and left-handed bases determine partition of 
the set of all bases of a space into two classes. The term “right- 
and left-handed" has no deep meaning of its own but is merely 
a convenient way of identifying the class to which one basis or 
another belongs. The concept of oriented volume is actually also 
related to these two classes. 

We have already met with such facts. All bases on a straight line 
can also be divided into two classes by combining in the same class 
vectors pointing in the same direction. It turns out that the mag¬ 
nitude of a directed line segment is closely similar to an oriented 
volume if both notions are considered as functions on systems of 
vectors. Property A holds according to relations. (9.8). Property B 
is true since the magnitude of a zero line segment is zero. That 
Property C holds is obvious. 

A similar investigation could be carried out independently in the 
case of the plane too. It is more simple, however, to use the results 
already obtained. Fix some x, y Cartesian system. Supplement 
it to a right-handed x, y, z coordinate system in space. Note that 
depending on the location of the x and y axes the z axis may have 
one of the two possible directions. This again determines partition 
of the set of bases of the plane into two classes. The oriented area 
5* (a, b) of the parallelogram constructed on vectors a and b in 
the x, y plane may be defined, for example, by the equation 
£± (a, b) = V ± (a, b, k). Properties A, B and C again hold of 
course. 


8-0510 
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Thus, assigning to lengths, areas and volumes some signs and 
considering them as functions given on systems of vectors we can 
make all these functions have the same algebraic properties A, B 
and C of (34.4). 


Exercises 

1. Prove that a, b and e are coplanar if and only if 
their triple scalar product is zero. 

2. Prove that for any three vectors a, b and e 

(a, [i, c]l = (a, e) b — (a, b) e. 

3. Prove that vector multiplication is not an associative operation. 

4. Find an expression of the oriented area of a parallelogram in terms of the 
Cartesian coordinates of vectors in the plane. 

5. Will formulas (34.5) and (34.6) change if the coordinate system relative 
to which the vectors are given is left-handed? 


35. Volume and oriented volume 
of a system of vectors 

In vector spaces of directed line segments 
area and volume are derived concepts of the length of a line segment. 
We have already extended the concept of length to abstract Euclid¬ 
ean spaces. We now- consider a similar problem for area and volume. 




Let Xj and x 2 be ttvo noncollinear vectors in the plane. Construct 
on them a parallelogram, taking Xj as its base (Fig. 35.1). Drop 
from the terminal point of x 2 to the base a perpendicular h. The 
area S (x v x 2 ) of the parallelogram will be defined by the formula 

S (x„ x 2 ) = | x x | | h |. (35.1) 

Denote by L 0 a zero subspace and by Z,, the span constructed 
on Xj. Since 

I I = I ort L4 x, |, 
formula (35.1) can be written as 

S (xj, x 2 ) = | ort^ x t | | ort L , x 2 |. 


(35.2) 
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Take then three noncoplanar vectors x lt x 2 and x 3 in space. Con¬ 
struct on them a parallelepiped, taking as its base the parallelogram 
formed by x 2 and x 2 (Fig. 35.2). Drop from the terminal point of x s 
to the base a perpendicular h 2 . The volume V (x lt x 2 , x 3 ) of the 
parallelepiped will be defined by 

V (x„ x 2 , x 3 ) = S (x lf x 2 ) | k x |. 

If by Z/ 2 we denote the span constructed on X! and x 2 , then by (35.2) 

V (x„ x 2 , x 3 ) = | ort Lt Xj | | ort tl x 2 | | ortL,x 3 |. 

Thus the length of a vector, the area of a parallelogram and the 
volume of a parallelepiped are expressed in vector spaces V x , V 2 
and V 3 by formulas in which one cannot but see a certain regularity: 

I x, | = | ort L , x, |, 

S (x„ Xj) = | ort L ,x, | | ort Ll x 2 1, (35.3) 

V (x,, x 2 , x 3 ) =- | ort Lj Xj | | ort Ll x 2 | | ort Li x 3 |. 

In particular, the number of factors coincides everywhere with the 
dimension of the space. 

These formulas suggest how to introduce the concept of volume 
in a Euclidean space E n of dimension n. Let Xj, x 2 , . . ., x„ be an 
arbitrary system of vectors in E n . Denote by L 0 a zero subspace and 
by L t the span formed by vectors x lt . . ., x<. Then by analogy with 
spaces of directed line segments we say that: 

The volume V (x lt x 2 , . . x„) of a system of vectors x lt x 2 , . . . 

.... x n of a Euclidean space E n is the value on that system of a real¬ 
valued function of n independent vector variables in E n defined by 
the following equation: 

n -1 

7(x„ x 2 , ,..,x n )= { I| |ort L( x <+1 |. (35.4) 

Of course we cannot as yet say that the volume of a system of 
vectors possesses all the properties inherent in a volume for any n. 
But for Euclidean spaces of dimensions 1, 2 and 3 respectively, by 
virtue of Euclidean isomorphism and relations (35.3), it clearly has 
the same properties as have the length of a line segment, the area 
of a parallelogram and the volume of a parallelepiped. 

We now try to approach the concept of the volume of a system of 
vectors of a Euclidean space E„ from another point of view. As 
already noted, assigning definite signs turns length, area and volume 
into algebraic functions possessing some common properties. There¬ 
fore it may be expected that there are similar properties in an arbi¬ 
trary Euclidean space, too. Bearing this in mind we give the follow¬ 
ing definition: 


8 * 
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The oriented volume V* (x lt x 2 , . . ., x„) of a system of vectors 
x lT x 2 , . . x n of a Euclidean space E„ is the value on that system 
of a real-valued function of n independent vector variables in E n 
possessing properties (34.4). 

Much is unclear about this definition too. Thus we do not know if 
there is an oriented volume for any system of vectors in an arbitrary 
Euclidean space for n ^ 4. But even if there is, is it uniquely defined 
by properties (34.4)? And finally what relation is there between vol¬ 
ume and oriented volume in general? Now we can answer only the 
last question, and only for the case n = 1, 2, 3. 

We shall sometimes have to consider volume and oriented volume 
in a space E n for systems containing fewer thann vectors. This will 
mean that in factwe are dealing not with the whole space but with 
some of its subspaces from which a given system is taken. According¬ 
ly properties (34.4) will be considered only in relation to vectors 
of the same subspace. A need may arise to consider volume and 
oriented volume for systems containing more than n vectors. Accord¬ 
ing to formula (35.4) and Property B of (34.4) both functions must 
be zero on such systems. 

In conclusion note that the use of two diSerent notions associated 
with volume will substantially simplify their investigation, since 
one concept reflects the geometrical aspect of the problem concerned 
and the other reflects its algebraic aspect. We shall soon see that 
there is a very close relation between them. We shall also see that 
it is important to introduce these notions because they generate 
a mathematical tool whose significance is not limited to the volume 
problem. 


Exercises 

1. Prove that in spaces of directed line segments ori¬ 
ented length, area and volume are defined uniquely hy conditions (34.4). 

2. Will the same concepts he uniquely defined if one of the conditions (34.4) 
is excluded? 

3. Prove that in any Euclidean space V (x,, x 2 ) = | x, | • | x 2 | if and only 
if vectors x, and x 2 are orthogonal. 

4. Prove that in any Euclidean space V (x,, x 2 ) = V (x 2 , x 2 ). 


36. Geometrical and algebraic properties 
of a volume 

We begin our study of the concept of volume 
in a Euclidean space E n by exploring its geometrical and algebraic 
properties following from its definition. 

Property 1. Always V (x lt x 2 , . . ., x„) ^ 0. The equation 
V (Xj, x 2 , . . ., x„) = 0 holds if and only if the system of vectors 
x t , x 2 , . . ., x n is linearly dependent. 
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The first part of the statement follows in an obvious way from 
(35.4), and therefore only its second part needs to be proved. Let x lf 
x 2 , . . ., x„ be a linearly dependent system. If ij = 0, then by defi¬ 
nition so is the volume. If x^^O, then some vector x h + 1 is linearly 
expressible in terms of the preceding vectors x lt . . x b . But then 
ort Lft x h+1 = 0 and again the volume is zero. 

Suppose now that the volume is zero. According to the definition 
this means that so is one of the multipliers on the right of (35.4). 
For that multiplier let i = k. If k = 0, then Xj = 0. If k 0, 
then the condition ort Lh x h+1 = 0 implies that x ft+1 is in the span 
formed by the vectors Xj, . . ., x h , i.e. that the system x 1( . . ., x ft+1 
is linearly dependent. So is the entire system of vectors Xj, x 2 , . . . 
. . ., x„ in both cases. 

Property 2. For any system of vectors x 1( x 2 , . . ., x„ 

n- 1 

V (x,, x 2 , • • ., x»)< I] I *i+i |. (36.1) 

i=0 

( Hadamard's inequality), with equality holding if and only if the sys¬ 
tem Xj, x 2 , . . x„ is orthogonal or contains a zero vector. 

According to the properties of a perpendicular and a projection 
clearly 

I ort Lj x f+1 |<| x l+1 |, (36.2) 

the inequality becoming an equation if and only if x, +1 _L or 
equivalently if x J+1 is orthogonal to the vectors x lt x 2 , . . Xj. 
Consider the product of the left- and right-hand sides of inequalities 
of the form (36.2) for all i. We have 

n - 1 n - 1 

I] I ort, x I + 1 1 < [] |x 1+ , |. 

<=0 1 i=0 

If all vectors of the system x,, x 2 , . . x„ are nonzero, then this 
inequality becomes an equation if and only if the system is orthog¬ 
onal. The case of a zero vector is trivial. 

It is possible to deduce several useful properties from Hadamard’s 
inequality. Let x lt x 2 , . . ., x„ be a normed system. Then it is 
obvious that 

V (x lt x 2 , . . x„) < 1. 

The following statement is also true. If the system x,, x 2 , . . ., x„ 
is normed and its volume equals unity, then it is orthonormal. 
Since the volume of any normed system is not greater than unity, 
this means that of all normed systems the orthonormal system has 
a maximum volume. 
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Property 3. For any two orthogonal sets of vectors x lf x 2 , . . 
x p and >/ lt y 2 , . . ., y r 

V (x„ x 2 , . . Xp, y 2 , y 2 , . . ., !/ P ) 

= V (x„ x 2 .x p ) V (y u y 2 , . . ., y r ). 

Denote by Li the span formed by the first i vectors of a joined 
system x lt . . x p , y lt . . y r and by K t the span formed by 
vectors y lt . . ., y t . Under the hypothesis each of the vectors in 
the system y x , . . y r is orthogonal to all vectors of the system 
x lt . . x p . Therefore 

Lp+i = L p © K t 

for every t from 0 to r. Now, taking into account (30.10), we have 

p -1 r -1 

V (x„ ...,x p ,y l . yr) — ( n I ort L, x !+il) (n o I ort tp +< J/t+il) 

P-1 T -1 

= ([J I ort L x, + 1 |)( [] I ortjc.y<+i |) 

1=0 1 1=0 1 

— V (Xj, . . ., Xp) V (j/i, • . • , yr)- 

Before proceeding to further studies we make a remark. The 
volume of a system is only expressible in terms of the perpendiculars 
dropped to spans formed by the preceding vectors. Taking into 
account the properties of perpendiculars, it may therefore be con¬ 
cluded that the volume of the system will remain unaffected if to 
any vector any linear combination of the preceding vectors is added. 
In particular, the volume will remain unaffected if any vector is 
replaced by the perpendicular from that vector to any span formed 
by the preceding vectors. 

Property 4. The volume of a system of vectors remains unaffected 
by any rearrangement of vectors in the system. 

Consider first the case where in a system of vectors x lt . . x„ 
two adjacent vectors x p+1 and x p+2 are interchanged. According 
to the above remark the volume will remain unaffected if x p+1 
and x p+2 are replaced by the vectors orti.,, x p+1 and orU p x p+J , 
and Xp^. 3 , . , ., x„ are replaced by ort £, p+2 ^p+ 3 * • • •• orli.p +2 x^. 
But now three sets of vectors 

Ip» 

ortf xgi • • ort^ x n 

* /HJ L P+ 2 " 
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are mutually orthogonal and in view of Property 3 we have 

V (Xj, . . ., Xp + ,, Xp + 2, • • • 1 x n) = P ( x v - • •» x p) 

X V (ortx. p Xp +1 , ort I _ p Xp, 2 ).7(ort i . p+2 Xp +3 .ort Lp+2 x n ). 

It is clear that the spans of the vectors x lt . . ., x p , x p+1 , x p+2 
and Xj, . . x p , x p+2 , x p+1 coincide; consequently 

V (x^ ..., x p+2 , x pT j, ..., x n ) = V (Xj, ..., Xp) 

X V (ortx.p x p + 2 . ortx.p x p+1 ) • V (ort Lp+2 x p+s .ort Lp+2 x n ). 

By virtue of Euclidean isomorphism the volume of a system of 
two vectors possesses the same properties as the area of a parallelo¬ 
gram does. In particular, it is independent of the order of the vectors 
of the system. Comparing the right-hand sides of the last two equa¬ 
tions we now conclude that 

V (Xj, . . ., x p+1 , x p+2 , . . x n ) 

= V (Xj, . . ., x p+2 , x p+1 , . . ., x n ). 

A little later we shall prove that any permutation x, t , xj,, . . . 
. . ., xj n of vectors of a system x lt x 2 , .... x„ can be obtained by 
successive interchanging of adjacent vectors. For an arbitrary per¬ 
mutation therefore Property 4 follows from the above special case. 

Property 5. The volume of a system of vectors is an absolutely homo¬ 
geneous function, i.e. 

V (Xj, . . ., ttXp, . . ., X„) = | CL | V (Xj* • • •» x pt • • •! x n ) 
lor any p. 

By Property 4 we may assume without loss of generality that 
p = n. But then in view of (30.6) we get 

n- 2 

V (x„ ..., x n _„ ax n ) = ( Q I ort L x l+I |) | ort L (ax„) | 

1=0 1 1 
n- 1 

= | a | fl | ort L x l+1 1 = | a | V (x„ ..., x n _„ x n ). 
i=o ' 

Property 6. The volume of a system of vectors remains unaffected, 
if to some one of the vectors of the system a linear combination of the 
remaining vectors is added. 

Again by Property 4 we may assume that to the last vector a linear 
combination of the preceding vectors is added. But, as already 
noted, in this case the volume remains unaSected. 

The volume of a system of vectors is a real-valued function. 
This function possesses some properties part of which we have already 
established. They have confirmed our supposition that the volume 
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of a system of vectors we have defined possesses in a Euclidean space 
all properties inherent in a volume for any n. But the most important 
thing is perhaps that the established properties uniquely define 
a volume. More precisely, we have 
Theorem 36.1. If a real-valued function F (x t , x 2 , . . ., x„) of 
n independent vector variables in E n possesses the following properties: 

(A) it remains unchanged by addition, to any independent 

variable, of any linear combination of the remaining 
independent variables, (36.3) 

(B) it is absolutely homogeneous, 

(C) it equals unity for all orthonormal systems, 

then it coincides with the volume of the system of vectors. 

Proof. If there is at least one zero independent variable among 
Xi, . . ., x„, then by Property B 

F (x„ x 2 , . . x„) = V ( x lr x 2 , . . ., x„) = 0. (36.4) 

Now let Xu x 2 , .... x n be an arbitrary system. Subtracting from 
each vector x ( its projection onto the subspace formed by the vectors 
x lf . . ., x ( _ x and taking into account Property A we conclude that 

F (x,, x 2 , ...,x n ) — F (ort t0 x„ ort L ,x 2 .° rt t n _,*..)• (36.5) 

If the system x 2 , . . ., x„ is linearly dependent, then there is 
at least one zero vector among ort Lf j x ( and (36.4) again holds. 
Suppose the system x lf x 2 , . . ., x„ is linearly independent. Then 
all vectors of the system 

ort Lo x„ ort Ll x z , ..., orti. ni x n 

are nonzero. Since in addition this system is orthogonal, there is an 
orthonormal system e 2 , e 2 , . . ., e„ such that 

ortj-,., Xi = | ort Lj _ j x, | e t . 

By Property C 

F (e„ e 2 , . . e n ) = 1. 

It follows from B therefore that 

n- 1 

F (x„ x 2 , • • • > x n ) = ( TJ I or fr j x i+ 1 |) F (Cj, e z , ..., e n ) 

i=0 1 

= V(x„ x z . x n ). 

The theorem allows us to state that if we construct in some way 
a function possessing properties (36.3), then it will be precisely the 
volume of a system of vectors. 
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Exercises 

1. Give a geometrical interpretation of Properties 2, 
3 and 6 in spaces of directed line segments. 

2. Give a geometrical interpretation of equation (36.5) in spaces of directed 
line segments. 

3. Can a function satisfying conditions (36.3) he zero on any linearly inde¬ 
pendent system of vectors? 

4. Suppose relative to an orthonormal hasis e lt e 2 , . . ., e n a system of vectors 
x x , x 2 , . . ., x n possesses the property 

(xj, ej) = 0 

for i = 2, 3, . . ., n and ; < i (i = 1, 2, .... n — 1 and / > i). Find the expres¬ 
sion for the volume V (ij, x 2 , . . x„) in terms of the coordinates of vectors 

x t , x 2 , . . ., x„ in the basis e,, e 2 , . . ., e n . 

5. What will change in the concept of volume for a complex space? 


37. Algebraic properties 
of an oriented volume 

We now proceed to study the algebraic prop¬ 
erties of an oriented volume, laying aside for the time being the 
question of its existence. Our study will be based on Conditions 
A, B and C of (34.4). 

Property 1. The oriented volume of a system of vectors is zero if any 
of its tuo vectors coincide. 

This property is a direct consequence of Condition B. It is not 
hard to prove that if A holds, then B and Property 1 are equivalent. 

Property 2. The oriented volume of a system of vectors changes sign 
if some two vectors are interchanged. 

The proof is similar for any two vectors and therefore for the 
sake of simplicity we restrict ourselves to the case where the first 
and second vectors are interchanged. By Property 1 

P± (x x + x 2 , x x + x 2 , x 3 , . . ., x n ) = 0. 

But on the other hand according to A 
V ± (x 1 + x 2 , x x + x 2 , x 3 , . . ., x„) 

= V ± (x lt x„ x 3 , . . ., x„) + V ± (x 2 , x 2 , x 3 .x„) 

+ P± (x lt x 2 , x 3 , . . ., x„) -)- V* (x 2 , Xj, Xj, . . ., x n ). 

On the right of this equation the first two terms are zero from which 
it follows that Property 2 is valid. It is again not hard to prove that 
if A holds, then B and Property 2 are equivalent. 

Property 3. The oriented volume of a system of vectors remains 
unaffected by addition, to any vector, of any linear combination of the 
remaining vectors. 
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Again for simplicity, consider only the first vector. According 
to A 

n 

V± (*1 + 0 CiX t , x 2 , ...,x„) 

n 

— P (£j, ^2* ' ' •’ % n ) -f- ^ 0C|P (X,, X 2 , • • ^n)* 

In this equation all terms at the right but the first are zero according 
to Property 1. 

Property 4. The oriented volume is a homogeneous function, i.e. 
V ± (x u . . ., ax p , . . x n ) = aV ± (x lt . . x p , . . ., x„) 
for any p. 

This property is a direct consequence of A. 

Property 5. The equation V ± (i lt x 2 , . . ., x„) = 0 holds if and 
only if the system of vectors x x , x 2 , . . x n is linearly dependent. 

Obviously it is only necessary to prove that V ± (z„ x 2 , ... 

. . x n ) = 0 implies a linear dependence of the vectors x x , x 2 , . . . 
. . ., x n . Suppose the contrary. Let the oriented volume be zero for 
some linearly independent system y u y it . . ., y„. This system 
is a basis of E n and therefore for any vector z in E„ 

z = ocji/! + a 2 y 2 + ... + a n y n . 

Now replace in the system y x , y 2 , . . ., y„ any vector, for example 
y x , by a vector z. Using in succession Properties 3 and 4 we find that 

n 

y ± (z. «/2 .«/n) = V ± (a 1 J/ 1 + JJa,t/„ i/ 2 , ..., y n ) 

= V ± (aiJ/l. 1/2. • • • . Vn) = «|v* (yv 1/2- • • ■ . 1/n) = 0. 

An oriented volume is by definition not zero at least on one linearly 
independent system z lt z 2 , . . ., z„. But replacing in turn vectors 
y j, y 2 , . . ., y n by Zj, z 2 , . . ., z„ we conclude from Theorem 15.2 
that on that system the oriented volume is zero. This contradiction 
proves the property in point. 

Property 6. If two oriented volumes coincide on at least one linearly 
independent system of vectors, then they coincide identically. 

Suppose it is known that oriented volumes F* (x lt x 2 , . . ., x„) 
and Vf (ij, x 2 , . . ., x„) coincide on a linearly independent system 

z„ z 2 , . . ., z n . Consider the difference F (x,, x 2 , . . ., x n ) = 
= F* (x lt x 2 , . . ., x„) — Vf (x If x 2 , . . ., x n ). This function satis¬ 
fies Properties 3 and 4 of an oriented volume. Besides it is zero on all 
linearly dependent systems and at least on one linearly independent 
system z It z 2 , . . ., z„. Repeating the arguments used in proving 
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Property 5 we conclude that F (i], x 2 , . . x n ) is zero on all linearly 
independent systems, i.e. that it is identically zero. 

It follows from Property 6 that an oriented volume is uniquely 
defined by conditions (34.4) if we fix the orthonormal system on 
which it must equal unity. 

Property 7. The absolute value of the oriented volume of a system of 
vectors coincides with the volume of the same system. 

Let an oriented volume equal unity on an orthonormal system 
Zj, z 2 , . . ., z n . Consider functions | V ± (x,, x 2 , . . ., x„) | and 
V (ij, x 2 , . . ., x n ). They both satisfy A and B of (36.3) and coin¬ 
cide on a linearly independent system z lf z 2 , . . z„. The function 

^ (^li ^2* • • •» ^n) = II (■*•1» ^2» • • •* "*•!>) | 

- V (x lt x 2 . x n ) I 

also satisfies A and B of (36.3) and is zero on all linearly dependent 
systems and at least on one linearly independent system z lt z 2 , ... 

. . ., z„. Repeating again the arguments used in proving Property 5 
we conclude that (x lt x 2 , . . ., x n ) is identically zero. 

The last property is very important, since it allows us to state 
that the absolute value of an oriented volume must have all the 
properties a volume has. In particular, it must equal unity on all 
orthonormal systems, and not only on one. Hadamard’s inequality 
holds for it, and so on. This property supplies a final answer to all 
questions posed by us concerning volume and oriented volume. The 
only thing we lack is the proof of the existence of an oriented volume. 

Exercises 

1. Prove that if A of (34.4) holds, then B is equivalent 
to both Property 1 and Property 2. 

2. Prove that whatever the real number a may be there is a system of vectors 
such that its oriented volume equals a. 

3. Suppose C of (34.4) is replaced by the condition of equality to any fixed 
number on any fixed linearly independent system. How is an oriented volume 
affected? 

4. Were the presence of a scalar product in a vector space and the reality of 
an oriented volume used in deriving the properties of an oriented volume? What 
will change if we consider an oriented volume in a complex space? 

38. Permutations 

Consider a system x lt x 2 , . . ., x n and a sys¬ 
tem x h , x h , . . ., x,„ obtained from the first using several permu¬ 
tations of vectors. Suppose these systems may be transformed into 
each other by successive transformations of only pairs of elements. 
Then the volumes of the systems will be the same and their oriented 
volumes will be either the same or differ in sign depending on the 
number of transpositions required. 
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In the questions of permutations we are going to discuss, individual 
properties of vectors will play no part, what will be important is 
their order. Therefore instead of vectors we shall consider their 
indices 1, 2, . . ., n. A collection of numbers 

7l< J• - •» Jn 

among which there are no equal numbers and each of which is one 
of the numbers 1, 2, .... n is called a permutation of those num¬ 
bers. The permutation 1, 2, .... n is called normal. 

It is easy to show that a set of n numbers has n! possible permu¬ 
tations in all. Indeed, for n = 1 this is obvious. Let the statement be 
true for any set of n — 1 numbers. All permutations of n numbers 
can be grouped into n classes by placing in the same class only per¬ 
mutations that have the same number in the first place. The number 
of permutations in each class coincides with the number of permu¬ 
tations of n — 1 numbers, i.e. is equal to (n — 1)!. Consequently, 
the number of all permutations of n numbers is nl. 

It is said that in a given permutation numbers i and / form an 
inversion if i > / but i precedes j in the permutation. A permutation 
is said to be even if its numbers constitute an even number of inver¬ 
sions and odd otherwise. If in some permutation we interchange any 
two numbers, not necessarily adjacent, leaving all the others in 
their places, we obtain a new permutation. This transformation of 
a permutation is called a transposition. 

We prove that any transposition changes the parity of the per¬ 
mutation. For adjacent numbers this statement is obvious. Their 
relative positions with respect to other numbers remain the same 
and permutation of the adjacent numbers changes the total number 
of inversions by unity. 

Suppose now that between numbers i and / to be interchanged 
there are s other numbers k t , k t , . . ., Ar s , i.e. the permutation is 
of the form 

. . ., t, k j, Aj, . . ., k s , j, ... . 

We shall interchange the number i successively with the adjacent 
numbers A:,, A: 2 , . . ., k„ j. Then the number / now preceding i is 
transferred to the left by s transpositions with numbers k si k x . u . . . 
. . ., At,. We thus carry out 2s -j- 1 transpositions of adjacent num¬ 
bers in all. Consequently, the permutation will change its parity. 

Theorem 38.1. All n! permutations of n numbers can be arranged in 
such an order that each subsequent permutation is obtained from the 
preceding one by a single transposition, beginning with any permutation. 

Proof. This is true for n = 2. If it is required to begin with the 
permutation 1, 2, then the desired arrangement is 1, 2, 2, 1, if, 
however, we are to begin with the permutation 2, 1, then the desired 
arrangement is 2, 1, 1, 2. 
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Suppose the theorem has already been proved for any permutations 
containing no more than re — 1 numbers. Consider permutations 
of re numbers. Suppose we are to begin with a permutation i lt i 2 , . . . 
. . ., t'„. We shall arrange permutations according to the following 
principle. We begin with permutations with i, in the first place. 
According to the assumption all these permutations can be ordered 
in accordance with the requirements of the theorem, since in fact 
it is necessary to arrange in the required order all permutations 
of re — 1 numbers. 

In the last permutation obtained in this way we make one trans¬ 
position, transferring to the first place the number t 2 . We then put 
in order, as in the preceding case, all permutations with a given 
number in the first place and so on. In this way it is possible to look 
over all permutations of re numbers. 

With such a system of arranging permutations of re numbers 
adjacent permutations will have opposite parities. Considering 
that re! is even for re ^ 2 we can conclude that in this case the number 
of even permutations of re numbers equals that of odd permutations 
and is rel/2. 


Exercises 

1. What is the parity of the permutation 5, 2, 3, 1, 4? 

2. Prove that no even (odd) permutation can be transformed into a normal 
one in an odd (even) number of transpositions. 

3. Consider a pair of permutations <„ i t . i n and 1,2,.. ., n. We trans¬ 

form into the normal form the first permutation employing transpositions, 
making for each of them one transposition of any elements in the second permuta¬ 
tion. Prove that after the process is over the second permutation will nave the 
same parity as the permutation i„ i 2 , . . ., 


39. The existence 

of an oriented volume 

We now discuss the existence of an oriented 
volume of a system of vectors. Choose in a space E n an orthonormal 
system z,, z 2 , . . ., z„ on which an oriented volume must equal 
unity by Condition C of (34.4). Take an arbitrary system x x , x 2 , . . . 
. . ., x„ of vectors in E n . Since the system z,, z 2 , . . ., z„ is a basis 
in E n , for every vector x, there is an expansion 

x ( = aj,z x + c, 2 z 2 + . . • + a ln z n (39.1) 

with respect to that basis, where atj are some numbers. 

If an oriented volume exists, then according to A of (34.4) we can 
transform it successively, taking into account expansions (39.1). 
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Namely, 

n n n 

( x i* x 2> • • •» x n) = ( 2 a ljt Z 3i» 2 • . . , 2 ®nj z j ) 

>.=■ 1 3,= 1 3„=1 " " 

n n n 

~ 2 ( z 3ii 2 a2 >i Z j«> ,2 a n3 n z i n ) 

3,-1 ;,= i ? n =l 

n n n 

= 2 2 ( z 3,» Zj,, . .., _2 an i n Zl n^ 

3t=l/>=l 3 n "=l 

n n n 

— . . . = 2 2 • ■ • .2 a lii a 2i, • ■ • ®3>3 n ^ ± ( z i,> z 3,» • • •» z 3' n )* 

3,=i 17= 1 3 n —1 

(39.2) 

In the last n-fold sum most of the terms are zero, since by Prop¬ 
erty 1 the oriented volume of a system of vectors is zero if any two 
vectors of the system coincide. Therefore only those of the systems 
Zj,, Zj t , . . ., z,„ should be considered for which the set of indices 
/i, /„ . . ., ]' n is a permutation of n numbers 1, 2, . . ., n. But 
in this case 

V-(z,„ Zj„ .... z; n ) = ± 1 

depending on the evenness or oddness of the permutation of indices. 

Thus if an oriented volume exists, then it must be expressible 
in terms of the coordinates of vectors z lt x 2 , . . ., x„ in the basis 
z j, z 2 , . . ., z„ by the following formula: 

V* (Xj, x 2 , • • •, x n ) = 2 i ®i i, a 2 j t • • • (39.3) 

Here summation is taken over all permutations of indices / lt / t , . . . 
...,/„ of numbers 1, 2, . . ., n and a plus or a minus sign is taken 
according as the permutation is even or odd. 

We prove that the function given by the right-hand side of (39.3) 
satisfies all conditions defining an oriented volume. Let a vector x p 
be a linear combination of vectors x p and Xp, i.e. 

x p = axp 4- px^, 

for some numbers a and p. Denote by a p ) and a p) respectively the 
coordinates of x p ' and x p in a basis z lt z 2 , . . ., z„. Then it is obvious 
that 

a p j = aa' p } + fia p ) 

for every / in the range from 1 to n. We further find 
2 ± a i it • • • a pi p • • • Znin 

= 2 ± fl ii. • ■ • (o^p + fapip) • • • 

— a 2 ± a lj, • • • a pip • • • a nJ n “H P 2 ± a l3. • • • a P) p • • • 
and thus A of (34.4) holds. 
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Suppose we permute some two vectors of a system x x , x 2 , . . ., x„. 
In this case function (39.3) changes sign since the parity of each 
permutation is changed. As already noted, if the property of linearity 
in each independent variable holds, the property just proved ia 
equivalent to Condition B of (34.4). 

And finally consider the value of the constructed function on 
a system of vectors z u z 2 , . . ., z„. For that system coordinates 
have the following form: 

( 0 if 

° l> | 1 if i = ;. 

Consequently, of terms (39.3) only one term a a a aa . . . a nn is non¬ 
zero. The permutation 1, 2, . . ., n is an even one and the elements 
a u , a 22 , . . ., a nn equal unity, so the value of the function on the 
orthonormal system z,, z 2 , . . ., z„ is 1. 

Thus all Conditions (34.4) hold and function (39.3) is an expres¬ 
sion of the oriented volume of a system of vectors in terms of their 
coordinates. That expression is unique by virtue of the uniqueness 
of an oriented volume. 


Exercises 

1. Was the orthonormality of the system z„ z 2 , . . ., z n 
actually used in deriving formula (39.3)? 

2. What changes will occur in formula (39.3) if in Condition C of (34.4) the 
oriented volume is not assumed to equal unity? 

3. To what extent Condition B of (34.4) was actually used in deriving (39.3)? 

4. Will the form of (39.3) be affected if we consider an oriented volume in 
a complex space? 


40. Determinants 


Let vectors x lt x 2 , . . ., x„ of a Euclidean 
space R n be given by their coordinates 

Xj = (flji, U( 2 » • • n) 

in basis (21.7). Arrange numbers ai } as an array A: 

a n a 12 • • • a m 

a 2l a 22 • • • a 2n 
1 ®n2 • • • ®nn 




This array is called a square n X n matrix and the numbers atf 
are matrix elements. If the matrix rows are numbered in succession 
from top to bottom and the columns are numbered from left to right, 
then the first index of an element stands for the number of the row 
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the element is in and the second index is the number of the column. 
The elements a,,, a 22 , . . a nn are said to form the principal diag¬ 
onal of the matrix A. 

Any n 2 numbers can be arranged as a square n X n matrix. If 
the row elements of a matrix are assumed to be the coordinates of 
a vector of R„ in basis (21.7), then a 1-1 correspondence is established 
between all square n X n matrices and ordered systems of n vectors 
of the space R„. 

In R„, as in any other space, there is an oriented volume. It will 
be unique if we require that Condition C of (34.4) should hold on 
the system of vectors (21.7). Taking into account the above 1-1 cor¬ 
respondence we conclude that a well-defined function is generated 
on the set of all square matrices. Considering (39.3) we arrive at the 
following definition of that function. 

An nth-order determinant corresponding to a matrix A is an algeb¬ 
raic sum of n! terms which is made up as follows. The terms of the 
determinant are all possible products of n matrix elements taken 
an element from each row and each column. The term is taken with 
a plus sign if the indices of the columns of its elements form an even 
permutation, provided the elements are arranged in increasing 
order of the row indices, and a minus sign otherwise. 

To designate a determinant we shall use the following symbol: 



/ a \\ 

a l2 . 

• • flln \ 

det I 

a Z\ 

a 22 • 

• • a 2 n I 


\a n( 

a n 2 • 

• • a nn ) 


(40.1) 


if it is necessary to give matrix elements in explicit form. If, however, 
this is not necessary, we shall use a simpler symbol, 

det A , 


restricting ourselves to the notation of a matrix A. The elements 
of the matrix of a determinant will also be called the elements of the 
determinant. 

The determinant coincides with the oriented volume of the system 
of matrix rows. In investigating it therefore it is possible to use 
all known facts pertaining to volumes and oriented volumes. In 
particular, the determinant is zero if and only if the matrix rows 
are linearly dependent, the determinant changes sign and so on. 
Now our studies will concern those of its properties which are difficult 
to prove without using an explicit expression of the determinant in 
terms of matrix elements. 

The transpose of a matrix is a transformation such that its rows 
become columns and its columns become rows with the same indices. 
The transpose of a matrix A is denoted by A'. Accordingly the deter- 
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minant 

/ a ll a 2i • • • a m 

detl a * 2 fl22 ' ‘' 0,12 


\®in ®2n • • • ®nn 

or det A' is said to be obtained by transposing determinant (40.1). 
As to transposition the determinant possesses the following impor¬ 
tant property: 

The determinant of any matrix remains unaffected when transposed. 

Indeed, the determinant of a matrix A consists of terms of the 
following form: 

a i ) l a 2 j t ...a n j n (40.2) 

whose sign depends on the parity of the permutation j\, /„ . . 

In the transpose A' all multipliers of product (40.2) remain in 
different rows and different columns, i.e. their product is a term of 
the transposed determinant. We denote the elements of A’ by alj. 
It is clear that a\j = an and therefore 

a ii, a 2i, ■ ■ ■ a-ni n =a},iai,2 • • • aj n „. (40.3) 

We put the elements of the right-hand side of (40.3) in increasing 
order of row indices. Then the permutation of column indices will 
have the same parity as that of the permutation / lt ; 2 , . . 

But this means that the sign of term (40.2) in the transposed deter¬ 
minant is the same as that in the original determinant. Consequently, 
both determinants consist of the same terms with the same signs, 
i.e. they coincide. 

It follows from the above property that the rows and columns of 
a determinant are equivalent. Therefore all the properties proved 
earlier for the rows will hold for the columns. 

Consider a determinant d of order n. Choose in its matrix k arbi¬ 
trary rows and k arbitrary columns. The elements at the intersection 
of the chosen rows and columns form a k x k matrix. The determi¬ 
nant of the matrix is called the fcth-order minor of the determinant d. 
The minor in the first k columns and the first k rows is called the 
principal minor. 

Suppose now that in an nth-order determinant d a minor M of order 
k is taken. If we eliminate the rows and columns at whose intersec¬ 
tion the minor M lies, we are left with a minor N of order n — k. 
This is called the complementary minor of M. If on the contrary we 
eliminate the rows and columns occupied by the elements of the 
minor N, then the minor M is obviously left. It is thus possible to 
speak of a pair of mutually complementary minors. 
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If a minor M of order k is in the rows with indices i u i 2 , . . i k 
and the columns with indices ;' 2l . . j h , then the number 

h 

S (f p +i p> 

(_1)P=1 N (40.4) 

will be called the algebraic adjunct or cofactor of ,1/. 

Theorem 40.1 (Laplace). Suppose in a determinant d of order n, k 
arbitrary rows (columns) are chosen, with 1 ^ k ^ n — 1. Then the 
sum of the products of all kth-order minors contained in the chosen rows 
(columns) by their algebraic adjuncts is equal to the determinant d. 

Proof. Assume the columns of the matrix of d to be vectors x lr 
x 2 , . . ., x n of a space R„. The sum of the products of all Arth-order 
minors contained in the chosen rows by their algebraic adjuncts may 
be considered as some function F (x lt x 2 , . . x n ) of the vectors 

% 2 , ' • m *^n* 

This function is obviously linear in each independent variable, 
since this property is true of both minors and algebraic adjuncts. 
It equals unity on the orthonormal system (21.7), which is easy 
to show by direct check. If we prove that F (x lt x 2 , . . ., x„) changes 
sign when any two vectors are interchanged, then this will establish 
that it coincides with the oriented volume of the vector system 
x u x 2 , . . ., x n . But the oriented volume coincides with the deter¬ 
minant of a matrix in which the coordinates of vectors are contained 
In the rows. Since the determinant of a matrix coincides with the 
determinant of the transpose of the matrix, the proof of the theorem 
is complete. 

Obviously it suffices to consider only the case where two adjacent 
vectors are interchanged, for permutation of any two vectors always 
reduces to an odd number of permutations of adjacent vectors. The 
proof of this fact was given in Section 38. 

Suppose vectors xt and xj +1 are interchanged. We establish a 1-1 
correspondence between the minors in the chosen rows of the original 
determinant and of the determinant with interchanged columns. 
We denote by © a set of column indices defining a minor. The follow¬ 
ing cases are possible: 

(1) i, i + 1 6 ©> 

(2) i, i + 1 £ to, 

(3) i 6 to, i + 1 £ to, 

(4) i + 1 6 o), i £ to. 

In cases (1) and (2) each minor is assigned a minor on the columns 
with © and in cases (3) and (4) a minor on the columns with the set 
of indices obtained from © by replacing i by i + 1 and i + 1 by 
i respectively. 
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Note that in all cases the corresponding minors are defined by the 
same set of elements. Moreover, in cases (2) to (4) they coincide and 
in case (1) they only differ in sign, since each of them is obtained 
from the other by interchanging two columns. For similar reasons, 
the corresponding complementary minors differ in sign in case (2) 
and coincide in the remaining cases. The algebraic adjuncts and 
complementary minors coincide up to a sign which depends on the 
parity of the sum of the indices of the rows and columns containing 
the minor. They are the same in cases (1) and (2) and differ by unity 
in cases (3) and (4). 

Comparing now the corresponding terms of F (x lt x 2 , . . ., x„) 
and those of the function resulting from permuting xi and xi +1 
we notice that they coincide up to a sign. Consequently, if two adja¬ 
cent vectors are interchanged F (x x , x 2 , . . ., x„) changes sign. 

The theorem is often used where only one row or one column is 
chosen. The determinant of a 1 X 1 matrix coincides with its only 
element. Therefore the minor at the intersection of the ith row and 
;th column is equal to the element a l} . Denote by A ti the algebraic 
adjunct of a l} . By Laplace’s theorem, for every i 

a tl-Ati + 0, l2 A i2 + • • • + a tn^tn = d. (40.5) 

This formula is called the expansion of the determinant with respect 
to the ith row. Similarly for every j 

a ijAu + a 2)A 2 ) + . . . + a n jA n j — d, (40.6) 

which gives the expansion of the determinant with respect to the fth 
column. 

We replace in (40.5) the elements of the ith row by a collection of 
n arbitrary numbers b lt b 2 , . . ., b n . The expression 

hi-^ii + b 2 A t2 + • . • + b n A( n 

is the expansion with respect to the ith row of the determinant 



a a 

a i2 

• • • a ln 

det 


b 2 

• • ♦ 


.®nl 

^n2 

• • • ®nn, 


obtained from the determinant d by replacing the ith row with the 
row of the numbers fc x , b 2 , .... b n . We now take as these numbers 
the elements of the Ath row of d, with k =£ i. The corresponding de¬ 
terminant (40.7) is zero since it has two equal rows. Consequently, 

a hiAt\ + + • • • + Qhn-Ain =0, k =£ i. (40.8) 
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Similarly 

a ihA\t + a 2k j4 2 ; + . . . + a nh A n j = 0, k j. (40.9) 

So the sum of the products of all elements of any row (column) of 
a determinant by the algebraic adjuncts of the corresponding ele¬ 
ments of another row (column) of the same determinant is zero. 

In conclusion note that the entire theory of determinants can be 
extended without change to the case of complex matrices. The only 
thing lost is the visualization associated with the concept of volume. 


Exercises 

1. Write expressions for determinants of the second 
and third orders in terms of matrix elements. Compare them with expression 
(34.6). 

2. Write Hadamard's inequality for the determinant of matrices A and A'. 

3. A determinant of the nth order all elements of which are equal to unity 
in absolute value equals n n/a . Prove that its rows (columns) form an orthog¬ 
onal basis. 

4. Find the determinant whose elements satisfy the conditions a ( , = 0 for 

Of (* </, i>l, Kl). 

5. The elements of a determinant satisfy the conditions a ( < = 0 for l > k 
and j < k. Prove that the determinant is the product of the principal minor of 
order k and its complementary minor. 

6. Let the elements of a complex matrix A satisfy the conditions a lt = a Jt 
for all ( and /. Prove that the determinant of such a matrix is a real number. 


41. Linear dependence 
and determinants 


One of the most common applications of de¬ 
terminants is in problems connected with linear dependence. Given 
m vectors x v z 2 , . . ., x m in a space K„ of dimension n, determine 
their basis. Choose some basis in K n and consider the rectangular 
array 



(41.1) 


where the rows represent the coordinates of the given vectors in the 
chosen basis. 

Such an array is called a rectangular matrix. As before the first 
index of an element a t] stands for the number of the matrix row 
the element is in, and the second index is the number of the column. 
If we want to stress what number of rows and columns a matrix A 
has, we shall write A (m X n) or say that the matrix A is an m 
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by n matrix. A matrix A (n X n) will as before be called a square n 
by n matrix. Along with the matrix A we shall consider its transpose 
A'. If A is m by n, then A' is n by m. 

In a rectangular matrix A (m X n) it is again possible to indicate 
different minors whose order of course does not exceed the smaller 
of the numbers m and n. If A has not only zero elements, then the 
highest order r of nonzero minors is said to be the rank of A. Any 
nonzero minor of order r is called a basis minor, and its rows and 
columns are basis rows and columns. It is clear that there may be more 
basis minors than one. The rank of a zero matrix is zero by definition. 

We shall regard the rows of A as vectors. It is obvious that if we 
find a basis of these row vectors, then the corresponding vectors of 
K n will form a basis of vectors x lt x 2 , . . ., x m . 

Theorem 41.1. Any basis rows of a matrix form a basis of row vectors 
of that matrix. 

Proof. To see that the theorem is true it is necessary to show that 
basis rows are linearly independent and that any matrix row is 
linearly expressible in terms of them. 

If basis rows were linearly dependent, then one of them would be 
linearly expressible in terms of the remaining basis rows. But then 
the basis minor would be equal to zero, which contradicts the hy¬ 
pothesis. 

Now add to the basis rows any other row of A. Then by the defini¬ 
tion of a basis minor all minors of order r + 1 in those rows will be 
zero. Suppose the rows are linearly independent. By supplementing 
them to a basis we construct some square matrix whose determinant 
must not be zero. But, on the other hand, expanding that determi¬ 
nant with respect to the original r + 1 rows we conclude that it is 
zero. The resulting contradiction means that any row in A is linearly 
expressible in terms of the basis rows. 

The theorem reduces the problem of finding a basis of a system of 
vectors to that of finding a basis minor of a matrix. Since the deter¬ 
minant of the transpose of a matrix coincides with that of the orig¬ 
inal matrix, it is clear that Theorem 41.1 is true not only for the 
rows but also for the columns. This means that for any rectangular 
matrix the rank of its system of row vectors equals the rank of its 
system of column vectors, a fact that is not obvious if we have in mind 
only the concept of the rank of a system of vectors. 

In a space with scalar product the linear dependence or indepen¬ 
dence of a system of vectors x lt x 2 , . . ., x m can be established with¬ 
out expanding with respect to a basis. Consider the determinant 


(X„ X,) (X lt Xa) ... 

^l) (^2» ^ 2 ) • • • 

(*|. x m) 
( x 2’ x m) 

( x m» x l) ( x m< x lf ••• 

( x m ' x m) 


) 
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called the Gram determinant or Gramian of the system of vectors 

Theorem 41.2. A system of vectors is linearly dependent if and only 
if its Gramian is zero. 

Proof. Let x lt x 2 , . . ., x m be a linearly dependent system of 
vectors. Then there are numbers a 1( a 2 , . . a m , not all zero, 
such that 

®1®1 “i" "V • • • ct m x m — 0. 

Performing scalar multiplication of this equation by x t for every i 
we conclude that the columns of the Gramian are also linearly de¬ 
pendent, i.e. the Gramian is zero. 

Suppose now that the Gramian is zero. Then its columns are lin¬ 
early dependent, i.e. there are numbers a v a 2 , . . ., a m , not all 
zero, such that 

<*! (*<, *i) + a 2 {x t , x 2 ) -f ... -r a m ( x it x m ) = 0 
for every i. We rewrite these equations as follows: 

(■Ef, “f* &2^*2 “1" • ^m^m) “ 

Multiplying them termwise by a x , a 2 , . . a m and adding we get 
\<x l x l + a 2 x 2 + . . . + a m j: m | 2 = 0. 

This means that 

a 2 x 2 + a 2 x 2 + • . • + a m x m = 0, 
i.e. that the vectors x lt x 2 , . . ., x m are linearly dependent. 

Exercises 

1. What is a matrix all of whose minors are zero? 

2. For a square matrix, are its basis rows and basis columns equivalent 
systems of vectors? 

3. Do the elementary transformations of its rows and columns, discussed in 
Section 15, affect the rank of a matrix? 

4. Prove the inequality 

n 

0 ^ G (xi, x 2 , . ■ -, x n ) ^ [ J (Xj, Xj). 

i=l 

In what cases the inequality becomes an equation? 

5. It is obvious that 



Prove that for any approximation of the number 1^2 by a finite decimal frac¬ 
tion p 
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42. Calculation of determinants 

A straightforward calculation of a determinant 
using its explicit expression in terms of matrix elements is rarely 
employed in practice because of its being laborious. A determinant 
of the nth order consists of n! terms and for each term to be calcu¬ 
lated and added to the other terms it is necessary to carry out n-n! 
arithmetical operations. Even carrying out all these computations on 
a modern computer performing 10 6 arithmetical operations per second 
to compute the determinant of, for example, the hundredth order 
would require many million years. 

One of the most efficient ways of calculating determinants is based 
on the following idea. Let a kp be a nonzero element in a matrix A. 
We call it the leading element. Adding to any ith row, i =£ k, a A;th 
row multiplied by an arbitrary number a t is known to leave the de¬ 
terminant unaffected. Take 


and carry out the indicated procedure for every i k. All the ele¬ 
ments of the pth column of the new matrix, except the leading one, 
will then be zero. Expanding the new determinant with respect to 
the pth column we reduce calculating the determinant of the nth 
order to calculating a single determinant of order (n — 1). We pro¬ 
ceed in a similar way with this determinant and so on. 

This algorithm is called the Gauss method. To calculate the deter¬ 
minant of the nth order by this method would require carrying out a 
total of 2n 3 /3 arithmetical operations. Now the determinant of the 
hundredth order could be computed, on a computer performing 
10® arithmetical operations per second, in less than a second. 

In conclusion note that with arithmetical operations approxi¬ 
mately performed and information approximately given, the results 
of computing determinants should be regarded with some caution. If 
conclusions about linear dependence or independence of a system 
of vectors are made only on the strength of the determinant being 
zero or nonzero, then in the presence of instability pointed out in 
Section 22 they may turn out to be false. This should be borne in 
mind whenever a determinant is used. 

Exercises 

1. What accounts for a faster rate of calculation of a deter¬ 
minant by the Gauss method as compared with straightforward calculation? 

2. Let all elements of a determinant equal at most unity in absolute value 
and suppose that in calculating each element we make an error of an order of e. 
For what n does a straightforward calculation of the determinant make sense in 
terms of accuracy? 

3. Construct an algorithm for calculating the rank of a rectangular matrix, 
using the Gauss method. What is the result of applying this algorithm to ap¬ 
proximate calculations? 
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The Straight Line 
and the Plane 
in Vector Space 


43. The equations of a straight line 
and of a plane 

The main object of our immediate studies are 
the straight line and the plane in spaces of directed line segments. 
Given some coordinate system, the coordinates of the points on the 
straight line or in the plane can no longer be arbitrary and must 
satisfy certain relations. Now we proceed to derive those relations. 

Given in the plane a fixed Cartesian x, y system and a straight 
line L, consider a nonzero vector 

n = (A, B) (431.) 

perpendicular to L. It is obvious that all other vectors perpendicular 
to L will be collinear with n. 

Take a point M 0 (x 0 , y 0 ) on L. All points M (x, y) of L and only 

those points possess the property that the vectors M 0 M and n are 
perpendicular, i.e. 

(M 0 M, n) - 0. (43.2) 

Since 

A/ 0 M = (x — x 0 , y—y 9 ), 
from (43.1) and (43.2) it follows that 

A (x — x 0 ) + B (y — y 0 ) = 0. 

Letting 

— Ax 0 — By 0 = C , 

we conclude that in the given x, y system the coordinates of the 
points of L and only those coordinates satisfy 

Ax + By + C = 0. (43.3) 

Among the numbers A and B there is a nonzero one. Therefore 
equation (43.3) will be called a "first-degree equation in variables x 
and y. 

We now prove that any first-degree equation (43.3) defines relative 
to a fixed x, y coordinate system some straight line. Since (43.3) is 
a first-degree equation, of the constants A and B at least one is not 
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zero. Hence (43.3) has at least one solution x 0 , y 0 , for example, 

AC „ _ BC 

x o— A*+B ia y°~ A 2 +B 2 ’ 

with 

Ax 0 + By 0 + C = 0. 

Subtracting from (43.3) this identity yields 
A (x — x 0 ) + B (y — y 0 ) = 0 

equivalent to (43.3). But it means that any point M (x, y) whose- 
coordinates satisfy the given equation (43.3) is on the straight line 
passing through M 0 (x 0 , y 0 ) and perpendicular to vector (43.1). 

So, given a fixed coordinate system in the plane, any first-degree 
equation defines a straight line and the coordinates of the points of 
any straight line satisfy a first-degree equation. Equation (43.3) is 
called the general equation of a straight line in the plane and the 
vector n in (43.1) is the normal vector to that straight line. 

Without any fundamental changes we can carry out a study of the 
plane in space. Fix a Cartesian x, y, z system and consider a plane 
n. Again take a nonzero vector 

n = (A, B, C) (43.4) 

perpendicular to n. Repeating the above reasoning we conclude that 
all points M (x, y, z) of n and only those points satisfy the equation 

Ax By -| - Cz D — 0 (43.5) 

which is also called a first-degree equation in variables x, y and z. 

If we again consider an arbitrary first-degree equation (43.5),. 
then we shall see that it also has at least one solution x 0 , y 0 , z 0 , 
for example, 

AD BD _ CD 

x o~ a 2 -\-B*-\-C 2 ' Vo ~ A 2 +B 2 +C 2 ' Z °~ A 2 +B 2 +C 2 * 

We further establish that any point M (x, y, z) whose coordinates 
satisfy the given equation (43.5) is in the plane through the point 
M 0 (x 0 , y 0 , z 0 ) perpendicular to vector (43.4). 

Thus, given a fixed coordinate system in space, any first-degree 
equation defines a plane and the coordinates of the points of any 
plane satisfy a first-degree equation. Equation (43.5) is called the 
general equation of a plane in space and the vector n in (43.4) is the 
normal vector to that plane. 

We shall now show how two general equations defining the same 
straight line or plane are related. Suppose for definiteness that we 
are given two equations of a plane n 

Aii + B t y + C x z + Di = 0, 

j4jX + Bfjy ■+■ CjZ -t-Z?2 = 0. 


(43.6) 
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The vectors 


Til — (-41, Bi, Cl), Rj — (^2* ^2» C 2 ) 

are perpendicular to the same plane n and therefore they are col- 
linear. Since furthermore they are nonzero, there is a number t such 
that, for example, 

Rj = tn 2 

or 

Ay - tA t , B x - tB 2 , C x = tC t . (43.7) 


Multiplying the second of the equations (43.6) by t and sub¬ 
tracting from it the first we get by virtue 
of (43.7) 

Dy = tD. j. 

Consequently, the coefficients of the general 
equations defining the same straight line or 
plane are proportional. 

A general equation is said to be complete 
if all of its coefficients are nonzero. An 
equation which is not complete is called 
Fig. 43.1 incomplete. Consider the complete equation 
of a straight line (43.3). Since all the coef¬ 
ficients are nonzero, that equation can be written as 



If we let 




= 1 . 


B 


a = 



£ 
B » 


then we obtain a new equation of a straight line 



1 . 


This is the intercept form of the equation of a straight line. The num¬ 
bers a and b have a simple geometrical meaning. They are equal to 
the magnitudes of the intercepts of the straight line on the coordi¬ 
nate semiaxes (Fig. 43.1). Of course the complete equation of a plane 
can be reduced to a similar form 


— +-£■+— = 1 . 


Different incomplete equations define special cases of location of 
a straight line and a plane. It is useful to remember them since they 
occur fairly often. For example, if C =0, equation (43.3) defines a 
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straight line through the origin, if B — C = 0, the straight line 
coincides with the y axis and so on. If A = 0, equation (43.5) defines 
a plane parallel to the x axis, if A = B — D = 0, the plane coin¬ 
cides with the x , y plane and so on. 

Any nonzero vector parallel to a straight line will be called its 
direction vector. Consider, for example, the case of a space and find 
the equation of a straight line through a given point M 0 ( x 0 , y 0 , z 0 ) 
with a given direction vector 

q = (l, m, n). 

Obviously, M ( x , y, z) is on the straight line if and only if the vec¬ 
tors M 0 M and q are collinear, i.e. if and only if the coordinates of 
these vectors are proportional, i.e. 

x — xp _ y—y 0 _ z — z 0 
l m n 

These equations are precisely the desired equations of a straight 
line. They are usually called the canonical equations of a straight 
line. It is clear that in the case of a straight line in the plane the 
equation will be of the form 

x—xp _ y — Vp 

l m 

if the straight line passes through the point M 0 (ar 0 , y 0 ) and has a 
direction vector q = (l, m). 

Using the canonical equations it is easy to obtain the equation of 
a straight line through two given points M 0 and M x . To do this it 

suffices to take as a direction vector the vector express its 

coordinates in terms of the coordinates of M„ and M l and substitute 
them in equations (43.8) and (43.9). For example, in the case of a 
straight line in the plane we shall have the following equation: 

t —j 0 _ y~y 0 
Xi—X 0 y t —Vo ’ 

and in the case of a space we shall have 

x Jp _ y — y 0 _ z —z 0 
—*0 y\ y o z 0 

Notice that in canonical equations of a straight line the denomi¬ 
nators may turn out to be zero. Therefore in what follows the propor¬ 
tion a/b = c/d will be understood to be the equation ad = be. Con¬ 
sequently, the vanishing of one of the coordinates of a direction 
vector implies the vanishing of the corresponding numerator in the 
canonical equations. 

To represent a straight line analytically it is common practice to 
write the coordinates of its points as functions of some auxiliary 


(43.9) 


(43.8) 


140 


The Straight Line and the Plane 


[Ch. 5 


parameter t. Take as the parameter t each of the equal relations of 
(43.8) and (43.9). Then for the case of a space we shall have the fol¬ 
lowing equations of a straight line: 


X — Xq -f“ It, 

y = y 0 + mt, (43.10) 

t = z 0 + nt 


and similar equations for the case of a straight line in the plane 


X = Xq “I" It, 

y = y 0 + mt. 


(43.11) 


These are called the parametric equations oj a straight line. Assign¬ 
ing to t diSerent values yields different points of a straight line. 

Providing great possibilities and convenience in writing the var¬ 
ious equations of a straight line and of a plane is the use of the 
concept of determinant. We introduce, for example, the equation of 
a plane through three diSerent points not on the same straight line. 
So, let M x (x x , ih, z x ), M 2 (x 2 , y 2 , z 2 ) and M s ( x 3 , y 3 , z 3 ) be points. 
Since they are not on the same straight line, the vectors 

M l M i = (x z -x l , y 2 —yi, z 2 — z,), M,M 3 = (x 3 —x„ y 3 — y y , z 3 —z,) 


are not collinear. Therefore M (x, y, z) is in the same plane with 
M lt M 2 and M 3 if and only if the vectors M X M 2 and M X M 3 and 
-> 

M ,M = (x — X(, y t/j, z —z,) 


are coplanar, i.e. if and only if the determinant made up of their 
coordinates is zero. Consequently, 

( x-xj y-y x z —z,\ 
xj —xi p 2 —p, z 2 —z, =0 (43.12) 

x 3 — X, y 3 — i/j z 3 — zj 

is the equation of the desired plane through three given points. 

Consider, finally, the equation of a straight line in space through 
a given point perpendicular to two nonparallel straight lines. Sup¬ 
pose both straight lines are given by their canonical equations 

*— _ y—vi _ *—*i 
h m i * 

z — _ y — Hi _ z—ij 

•| fUj fl] 
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The direction vector q of the desired straight line must be perpen¬ 
dicular to two vectors 

ft = (Jit m l’ ^ 1)1 ?2 = (^2i m 2' n l)- 

These are not collinear, and therefore it is possible to take as q, for 
example, the vector product [ft, q 2 \. Recalling the expression for 
the coordinates of the vector product in terms of the coordinates of 
the factors and using for notation the determinants of the second 
order we get 


/ . /to, n. \ 

(n. l.\ 

fl. m,\\ 

det 1 1 , 

det ( I, 

det ’ 1 

V \to 2 n 2 ) 

\ n 2 ^ 2 / 

V ^2 TOj,/ / 


If the desired straight line passes through the point M 0 (x 0 , y 0 , z 0 ), 
then its canonical equations will be as follows: 


1 

H 

1 

1 

© 

z—z 0 

det( mi "*) " 

dot ("»!•) 

det ft "*') 

\m 2 n 2 / 

\n 2 lit 

\li m 2 / 


Of course, fundamentally many conclusions concerning the equa¬ 
tions of a straight line and of a plane remain valid for any affine 
coordinate system. Our desire to use Cartesian coordinate systems 
is mainly due to their ensuring simpler calculations. 

Exercises 

1. Write the equation of a straight line in the plane 
through two given points using a second-order determinant. Compare it with 
(43.12). 

2. Is it correct to say that (43.12) is always the equation of a plane? 

3. By analogy with equations (43.10) write the parametric equations of 
a plane in space. How many parameters must they contain? 

4. Find tne coordinates of the normal vector to the plane through three given 
points not on the same straight line. 

5. What is the locus of points in space whose coordinates are solutions of 
a system of two linear algebraic equations in three unknowns? 


44. Relative positions 

Simultaneous consideration of several straight 
lines and planes gives rise to various problems, first of all to 
problems of determining their relative positions. 

Let two crossing planes be given in space by their general equa¬ 
tions 


j4jX + B 2 y + CjZ -(- Di — 0, 

AfX -(- Bfjy -]- CjZ -(- D 2 = 0* 
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They form two adjacent angles adding up to two right angles. We 
find one of them. The vectors = (A x , B x , C x ) and n 2 — ( A 2 , B 2 , C 2 ) 
are normal ones, so determining the angle between the planes re¬ 
duces to determining the angle cp between n x and n 2 . By (25.5) 

__ A x A 2 -\-B x B 2 + C x C 2 _ 

9 (i4f-l-Bi-rCf)»/* (A\ -t--rCD 1 /* • 

Quite similarly derived is the formula for the angle between two 
straight lines in the plane given by their general equations 

A x x + B x y -i - C x = 0, 

A 2 x + B 2 ij C 2 = 0. 

One of the angles <p made by these straight lines is calculated from 
the formula 

a -}- B\Bn 

008<P= (A\-rB\)'l*(Ai -+- ay*/* * 

The condition of the parallelism of straight lines given by their 
general equations is that of the collinearity of their normal vectors, 
i.e. the condition of the proportionality of their coordinates 

Ax__B± 

A 2 ~ B 2 ‘ 

The condition that straight lines should be perpendicular coincides 
with the condition that cos <p = 0 or equivalently with the condi¬ 
tion that 

A X A 2 + B x B 2 — 0. 

Of course, of a similar form is the condition 

A x __ By _ C x 
A t B 2 C, 

of the parallelism of planes and the condition 
A X A 2 + B X B 2 + C X C 2 = 0 

of the perpendicularity of planes also given by their general equa¬ 
tions. 

Suppose now that two straight lines, for example in space, are 
given by their canonical equations 

X—X X __ y — y x _ Z — Zj 

h "»i ’ 

x—x 2 _ y—y 2 _ z—z 2 

l 2 n 2 n 2 

Since the direction vectors of those straight lines are vectors g x = 
= (Z lt m x , n x ) and q 2 = (l 2 , m 2 , n 2 ), we conclude again that one of 
the angles tp between the straight lines will coincide with that be- 
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tween q t and q t . Consequently, 

rnts fn _ _ lilia + WtWg + nin; 

Accordingly, the proportionality of the coordinates 

h m i w i 
lg nij 

is the condition of the parallelism of the straight lines, and the 
equation 

l x l t -f TOj/Tlj + = 0 

is the condition of their perpendicularity. 

It is clear that if straight lines and planes are given in such a way 
that their direction vector or normal vector is explicitly indicated, 
then the finding of the angle between 
them always reduces to the finding of the 
angle between these vectors. Suppose, for 
example, a plane n is given in space by 
its general equation 

Ax + By + Cz + D = 0 

and a straight line L is given by its canon¬ 
ical equation 

*—*o _ y—y B _ »—*o 

l m n 

Since the angle ip between the straight 
line and the plane is complementary to 
the angle i|) between the direction vector of 
the straight line and the normal vector to the plane (Fig. 44.1), 
we have 

| Al + Bm + Cn \ 

3111 ‘P ~ (A-~B* + C 2 ) 1 /* (F + m 2 + n 2 ) 1 / 2 * 

It is obvious that 

Al + Bm + Cn = 0 , 

the condition of the parallelism of the straight line and the plane, 
and that 

A__ B__C_ 
l m n ’ 

the condition of the perpendicularity of the straight line and the 
plane. 

Assigning a straight line and a plane in the form of general equa¬ 
tions allows us to solve very efficiently an important problem, that 
of calculating the distance from a point to the straight line and 




144 


[Ch. .'> 


The Straight Line and the Plane 


from a point to the plane. The derivation of formulas is quite simi¬ 
lar in both cases and we again confine ourselves to a detailed dis¬ 
cussion of just one of them. 

Let n be a plane in space, given by its general equation (43.5). 
Take a point M 0 (x 0 , y 0 , z 0 ). Drop from M 0 a perpendicular to the 
plane and denote by M x (x lt y 1 , z x ) its foot. It is clear that the dis¬ 
tance p ( M 0 , n) from M 0 to the plane is equal to the length of M 0 Mi. 

The vectors n = (A, B , C) and M 0 M X = (x, — x 0l y x — y 0 , 
z i — z o) are perpendicular to the same plane and hence collinear. 

Therefore there is a number t such that M 0 M X = tn, i.e. 

x, — x 0 = tA , 

l/i — y 0 = tB, 

Zj — z„ = tC. 


The point M x (x lt y x , z x ) is in it. Expressing its coordinates in terms 
of the relations obtained and substituting them in the equation of a 
plane we find 

, __ ■4^0 + By a +Cz 0 -\- D 

But the length of n is (A 2 + B 1 -r C 2 ) 1/2 , and therefore | M 0 M X \ = 
= | t | (4 2 + B 2 - f- C 2 ) 1 / 2 . Consequently, 


In particular 


p(M 01 n) = 


I Ax 0 4- By 0 -)- Cz B -j- D \ 

(^2 + 524-C*) 1 / 2 


P (0, Jl) = 


I D I 

(i4 2 + 5*+C*)V 2 ' 


Along with the general equation (43.5) of a plane we consider the 
following of its equations: 

± (A 2 + B 2 + C 2 )- 1 /* (Ax + By + Cz + D) = 0. 

Of the two possible signs at the left we choose the one opposite to 
the sign of D. If D = 0, then we choose any sign. Then the free 
term of that equation is a nonpositive number — p, and the coef¬ 
ficients of x, y and z are the cosines of the angles between the normal 
vector and the coordinate axes. The equation 


x cos a + y cos P + z cos y — p = 0 (44.1) 

is called the normed equation of a plane. It is obvious that 
p ( M 0t ji) = | x 0 cos a + y 0 cos P + z 0 cos y — p |, 
p (0, ji) = p. 
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The distance p ( M 0 , L) from a point M 0 (x 0 , y 0 ) to a straight line 
L in the plane, given by its general equation (43.3), is determined 
from] a similar formula 


P (M 0 , L) = 


I Ax t -\-By a -\-C | 
(A 2 + B*)i/* 


The normed equation of a straight line is like this: 

x cos a + U sin a — p = 0. (44.2) 


Here a is the angle made by the normal vector with the x axis. 


Exercises 

1. Under what condition on the coordinates of normal 
vectors do two straight lines in the plane (three planes in space) intersect in a 
single point? 

2. Under what condition is the straight line (43.8) in plane (43.5)? 

3. Under what condition are two straight lines in space, given by their 
canonical equations, in the same plane? 

4. Calculate the angles between a diagonal of a cube and its faces. 

5. Derive the formula for the distance between a point and a straight line in 
space, given by its canonical equations. 


45. The plane in vector space 

We have repeatedly stressed that a straight 
liue and a plane passing through an origin can be identified in 
spaces of directed line segments with the geometrical representation 
of a subspace. But in their properties they differ but little from any 
other straight lines and planes obtained by translating or shifting 
these subspaces. Wishing to extend this fact to arbitrary vector 
spaces we arrive at the concept of plane in vector space. 

Let L be some subspace of a vector space K. Fix in K a vector x 0 . 
In particular it may be in L. A set H of vectors z obtained from the 
formula 

z = x 0 + y, (45.1) 

where y is any vector of L , is called a plane in K. The vector x 0 is 
a translation vector and the subspace L is a direction subspace. As to 
H it will be said to be formed by translating L by a vector x 0 . 

Formally the concept of plane includes those of straight lines and 
planes (in vector interpretation!) in spaces of directed line segments. 
But whether it possesses similar properties is as yet unknown to us. 

Each vector of H can be uniquely represented as a sum (45.1). 
If z = x 0 + y and z = x 0 + y', where y, y’ 6 L, then it follows 
that y = y’ . In addition, from (45.1) it follows that the difference 
of any two vectors of H is in L. 
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Choose in H a vector z 0 . Let z 0 = x 0 + y 0 . Represent (45.1) as 


z = z 0 + (y — y a ). 


The sets of vectors y and (y — y 0 ) describe the same subspace L. 
Therefore the last equation means that H may be obtained by trans¬ 
lating L by any fixed vector of the plane. 

A plane H is some set of vectors of K generated by a subspace L 
and a translation vector x 0 according to (45.1). It is a very important 
fact that any plane may be generated by only one subspace. Suppose 
that this is not the case, i.e. that there is another direction subspace 
L' and another translation vector x' t forming the same plane H. 
Then for any z 6 H we have z = x 0 + y, where y 6 L, and at the 
same time z = x' + y', where y‘ 6 L'. It follows that U is a col¬ 
lection of vectors of K defined by the formula 

y‘ = (*o — + y- 

Since the zero vector is in L’, it follows from the last formula 
that the vector (x 0 — x^) is in L. But this means that L' consists 
of the same vectors as L. 

We have already noted that the translation vector is not uniquely 
defined by a plane of course. However, here too the question of 
uniqueness can be dealt with in a quite natural way. 

Assume that a scalar product is introduced in K. It is clear that 
we obtain the same plane if we take ort L x 0 instead of x 0 . We may 
therefore assume without loss of generality that x 0 j_L. The vector 
x 0 is then called the orthogonal translation vector. We can now prove 
that every plane is generated by only one translation vector. 

Indeed, suppose there are two translation vectors, x' and xj, 
orthogonal to L but generating nevertheless the same plane H. 
Then for any vector y' 6 L there must be a vector y” 6 L such that 
x' + y' = x' + y". It follows that x' — x' 6 L. But under the 
hypothesis x' — xjJ _L. Consequently, xo — x' = 0, i.e. x' 0 = x'. 

This means in particular that in a space with scalar product any 
plane has only one vector orthogonal to the direction subspace. 

Two planes are said to be parallel if the direction subspace of one 
is in the direction subspace of the other. 

It is easy to justify this definition. Any two parallel planes H x 
and H % either have no vector in common or one of them is in the other. 
Suppose H t and have in common a vector z 0 . Since any plane 
can be obtained by translating the direction subspace by any of its 
vectors, both H x and H z can be obtained by translating the corre¬ 
sponding subspaces by a vector z„. But one of the subspaces is in the 
other and therefore one of the planes is in the other. 

A subspace is a special case of a plane. It is obvious that a sub¬ 
space L is parallel to any plane H obtained by translating L by some 



451 


The plane in vector space 


147 


vector x 0 . From the established property of parallel planes it follows 
that H coincides with L if and only if x 0 6 L. 

Consider now two nonparallel planes Hy and H t . They either have 
no vector in common or have a vector in common. In the former case 
Hy and H a are called crossing planes and in the latter they are called 
intersecting planes. 

Just as in the case of subspaces a set of vectors which are in both 
H 1 and H a is called the intersection of the planes and designated 
H\ f) H a . Let Hy be formed by translating a subspace L x and let 
H a be formed by translating a subspace L a . Denote 

H = Hy f) H 2 , L = Ly f) L a . 

Theorem 45.1. If an intersection H contains a vector z 0 , then it is a 
plane formed by translating an intersection L by that vector. 

Proof. Under the hypothesis of the theorem there is a vector z* 
in H. Suppose there is another vector z x 6 H. We represent it as 

z l = z 0 + ( z l z 0 ). 

Now from the sequence of relations 

Zi, Zq f H —*-z, 4 Zq f Hi, 

Z l) Z 0 6 H t —!*-Zy Zq £ hy, 

z l - Z 0 6 L a -*• Zy — z 0 6 L 

we conclude that any vector of H can be represented as a sum of z 0 
and some vector of L. 

Take then a vector / of L. We have 

/ 6 L -*■/ 6 Li, f 6 L> % ->-z 0 + / 6 H x , z 0 + / 6 H a -*-z 0 + / 6 H, 

i.e. any vector of L translated by a vector z 0 is in H. Thus the theo¬ 
rem is proved. 

A plane is not necessarily a subspace. Nevertheless it can be as¬ 
signed dimension equal to that of the direction subspace. A plane of 
zero dimension contains only one vector, the translation vector. 
In determining the dimension of an intersection of planes Theo¬ 
rem 19.1 is useful. From Theorem 19.1 and 45.1 it follows that the 
dimension of an intersection H does not exceed the minimum one 
of the dimensions of Hy and H % . 

If in spaces of directed line segments two (three) vectors are given, 
then with some additional conditions it is possible to construct only 
one plane of dimension 1 (2) containing the given vectors. Those 
additional conditions can be formulated as follows. If two vectors 
are given, then they must not coincide, i.e. they must not be in the 
same plane of zero dimension. If three vectors are given, then they 
must not be in the same plane of dimension one. 
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Similar facts hold in an arbitrary vector space. 

Let x 0 , x lt . . ., x k be vectors in a vector space. We shall say 
that they are in a general position if they are not in the same plane 
of dimension k — 1. 

Theorem 45.2. If vectors x 0 , x lt . . ., x h are in a general position, 
then there is a unique plane H of dimension k containing those vectors. 

Proof. Consider the vectors x x — x 0 , x 2 — x 0 , . . ., x k — x 0 . If 
they were linearly dependent, then they would be in some subspace 
of dimension k — 1 at most. Consequently, the vectors x 0 , x lt . . . 
. . ., x h themselves would be in the plane obtained by translating 
that subspace by a vector x 0 , which contradicts the hypothesis of 
the theorem. 

So the vectors x 2 — x 0 , x 2 — x 0 , . . ., x h — x 0 are linearly inde¬ 
pendent. Denote by L their span. The subspace L has dimension k. 
By translating it by a vector x 0 we obtain some plane H of the same 
dimension which contains all the given vectors x 0 , x x , . . ., x h . 

The constructed plane H is unique. Indeed, let the vectors 
x 0 , x lt . . ., x h be in two planes H 1 and H 2 of dimension k. The 
plane remains the same if the translation vector is replaced by any 
other vector of the plane. We may therefore assume without loss 
of generality that H 1 and H 2 are obtained by translating the cor¬ 
responding subspaces L x and L 2 by the same vector x 0 . But then it 
follows that both subspaces coincide since they have dimension k 
and contain the same linearly independent system x 2 — x 0 , 
x 2 Xq, . . ., Xfc x 0 . 


Exercises 

i. Let Hi and H t be any two planes. Define the sum 
Hi + H 2 to be the set of all vectors of the form z, + z 2 , where Zj 6 Hi and 
c 2 6 H,. Prove that the sum of the planes is a plane. 

2. Let H be a plane and let X be a number. Define the product \H of H by X 
to be the set of all vectors of the form Xz, where z £ H. Prove that X/f is a plane. 

3. Will the set of all planes of the same space, with the operations intro¬ 
duced above on them, be a vector space? 

4. Prove that vectors x 0 , z lt . . ., Xf, are in a general position if and only if 
the vectors x, — x 0 , x 2 — x a , . . ., x^ — x 0 are linearly independent. 

5. Prove that a plane of dimension k containing vectors of a general posi¬ 
tion x 0 , *j, . . ., x h is a subspace if and only if those vectors are linearly de¬ 
pendent. 


46. The straight line and 
the hyperplane 

In a vector space K of dimension m two classes 
of planes occupy a particular place. These are planes of dimension 
1 and planes of dimension m — 1. Geometrically any plane of di¬ 
mension 1 in spaces of directed line segments is a straight line. A 
plane of dimension m — 1 is a hyperplane. 
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Consider a straight line H in a vector space K. Denote by x 0 the 
translation vector and by q the basis vector of a one-dimensional 
direction subspace. Let these vectors be given by their coordinates 

*0 = (^lt x 2i • • •» x m)t 

9 = (?H ?2i • • •* 9m) 

relative to some basis of K. It is obvious that any vector z of H can 
be given by 

z = x 0 + tq, (46.1) 

where t is some number. Therefore relation (46.1) may be assumed to 
be a vector equation of H in K. If z has in the same basis the coor¬ 
dinates 

Z = (Z X1 Z 2 , . . .| Z m ), 


then, writing (46.1) coordinatewise, we get 

*i = *i + 9i*. 

Z 2 = ■)” 92*. 


Zm — x m d - 9m^‘ 


(46.2) 


Comparing now these equations with (43.10) and (43.11) it is natural 
to call them parametric equations of a straight line H. We shall say 
that a straight line H passes through a vector x 0 and has a direction 
vector q. 

By Theorem 45.2, it is always possible to draw one and only one 
straight line through any two distinct vectors x 0 and y 0 . Let x 0 
and y 0 be vectors given in some basis of a space K by their coordi¬ 
nates 

X„ ~ ( J 'l> x 2i • • •> ^mfi 
y 0 = (i/i* y 2 > • • •) 9m)- 


Since it is possible to take, for example, the vector y 0 — x 0 as di¬ 
rection vector, equations (46.2) yield the parametric equations 

Zi = *r+ (9i - * a ) t. 


z 2 = ^2 + ( 1/2 — x 2 ) L 


Zm x m ”1" (9m x m ) f 


(46.3) 


of the straight line through two given vectors. 

When t = 0 these equations define the vector x 0 and when t = 1 
they define the vector y 0 . If if is a real space, then the set of vectors 
given by (46.3) for 0 ^ t ^ 1 is said to ba the line segment joining 
vectors x 0 and y 0 . Of course, this name is associated with the geo- 
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metrical representation of this set in spaces of directed line seg¬ 
ments. 

Suppose H intersects some plane. Then according to the conse¬ 
quence arising from Theorem 45.1 the intersection is either a 
straight line or a single vector. If the intersection is a straight line, 
then it certainly coincides with the straight line H. But this means 
that if a straight line intersects a plane, then the straight line is 
either entirely in the plane or it and the plane have only one vector 
in common. 

The concept of hyperplane makes sense in any vector space, but 
we shall use it only in spaces with scalar product. 

Consider a hyperplane H. Let it be formed by translating an 
(to — l)-dimensional subspace L by a vector x 0 . The orthogonal 
complement L 1 is in this case a one-dimensional subspace. Denote 
by n any one of its basis vectors. A vector z is in H if and only if 
the vector z — x 0 is in L. That condition holds in turn if and only 
if z — x„ is orthogonal to n, i.e. if 

(n, x — x 0 ) = 0. (46.4) 

Thus we have obtained an equation satisfied by all vectors of H. 
To give a hyperplane in the form of this equation, it suffices to indi¬ 
cate any vector r, orthogonal to the direction subspace, and a trans¬ 
lation vector x 0 . 

The explicit form of the equation substantially simplifies making 
various studies. Given vectors n^, r 2 , . . ., n h and x x , x 2 , . . ., x h , 
investigate a plane It which is the intersection of the hyperplanes 


(R lt z — x x ) = 0, 
(r 2 , z — x 2 ) = 0, 


(r„, z — x k ) = 0. 


(46.5) 


This problem may be regarded as a solution of the system of equa¬ 
tions (46.5) for vectors z. Suppose that the intersection of the hyper¬ 
planes is not empty, i.e. that (46.5) has at least one solution z 0 . 
Then, as we know, the desired plane is defined by the following 
system as well: 

(Rj, z - z„) = 0, 

(r 2 , z — z 0 ) = 0, 


(r„, z — z 0 ) = 0 


(46.6) 


since any plane remains unaSected if the translation vector is re¬ 
placed by any other vector of the plane. 
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A vector y = z — z 0 is an arbitrary vector of the intersection L 
of the direction subspaces of all k hyperplanes. It is obvious that 
vectors y of the subspace L satisfy the following system: 


(»i. y) = 0, 
(n 2 , y) = 0, 


(n h , y) = 0. 


(46.7) 


Giving the intersection L as system (46.7) allows an easy solution 
of the question of the dimension of L. As is seen from the system 
itself, the subspace L is the orthogonal complement of the span of 
a system of vectors Bj, n 2 , • • •* n h . Let r be the rank of that system. 
Then the dimension of L and hence of R is m — r, where m is the 
dimension of the space. In particular, if r^, re 2 , . . ., n h are linearly 
independent, then the dimension of plane (46.5) is m — k. It is of 
course assumed that the plane exists, i.e. that system (46.5) has at 
least one solution. Tojjive the subspace L defined by system (46.7) 
it suffices to indicate its basis, i.e. any system of m — r linearly in¬ 
dependent vectors orthogonal to n^, n 2 , . . ., n h . 

Equation (46.4) of a hyperplane can be written in a somewhat 
different form. Let (n, x 0 ) = b. Then 


(n, z) = b (46.8) 

will define the same hyperplane as equation (46.4). Note that the 
general equations (43.3) and (43.5) of a straight line and of a plane 
are essentially given in the same form. It is important to stress 
that any equation of the form (46.8) can be reduced to the form (46.4) 
by a suitably chosen vector x 0 . To do this it suffices, for example, to 
take x 0 in the following form: 


x 0 = an. 

Substituting this expression in (46.4) and comparing with (46.8) 
we conclude that we must have 

_ b_ 

a ~(n, n)' 

We can now conclude that if a system of the form (46.5) defines 
some plane, then the same plane may also be defined by a system of 
the following form: 

(rtj, z) = b lt 
(^ 2 > ~ ^ 2 > 


(n k , z) = b h 


(46.9) 
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for appropriate numbers b lt b 2l . . ., b k . The converse is obviously 
true too. System (46.9) defines the same plane as (46.5) does for ap¬ 
propriate vectors x x , x 2 , . . x h . 

A straight line and a plane are hyperplanes in spaces V 2 and V 3 
respectively. We have established earlier the relation of the distance 
from a point to these hyperplanes to the result of a substitution of 
the coordinates of the point in the general equations. There is a 
similar relation in the case of an arbitrary hyperplane. 

Let H be a hyperplane given by equation (46.4). Denote as before 
by p ( v , H) the distance between a vector v and H. Taking into 
account equation (46.4) we get 

(n, v — x 0 ) = (n, v — x 0 ) — (n, z — x 0 ) = ( n , v — z) 


for any vector z in H. According to the Cauchy-Buniakowski-Schwarz 
inequality 

|(n, v — x 0 ) | < | n \ \ v — z |. (46.10) 

Therefore 


\v — z |> 


| (n, v — i B ) | 

I » I 


If we show that H contains one and only one vector, z*, for which 
(46.10) becomes an equation, then this will mean, first, that 

p( t >,^)=- l - ( - B » . [ ^ )l (46.11) 

and, second, that the value p (u, H) is attained only on that single 
vector z*. 

Denote by L the direction subspace of H. It is clear that any vector 
orthogonal to L is collinear with n, and vice versa. Inequality (46.10) 
becomes an equation if and only if n and v — z are collinear, i.e. 
v — z = an for some number a. Let equality hold for two vectors, 
z x and z 2 , of H, i.e. 

v — z x = a 2 n, 


It follows that 


v — z 2 = a 2 n. 


Zi — « 2 = (a 2 — a x ) n. 

Consequently, z x — z 2 J_ L. But z x — z 2 6 L as the difference of two 
vectors of a hyperplane. Therefore z x — z 2 = 0 or z x = z 2 . 

Denote by z 0 a vector in H orthogonal to L. As we know, that 
vector exists and is unique. We write a vector z as a sum 

z = z 0 + y, 

where y € L. A vector v can be represented as 

v = f + s. 
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where / £ L, i.e. s _L L. Now 

v — z = (s — z 0 ) + (/ - y). 


If we take z = z 0 + f, then 

v — z = s — z 0 . 

The vector h = s — z 0 is orthogonal to L, and formula (46.11) is 
established. 

We have simultaneously proved that any vector v of a space can 
be represented uniquely as a sum 

v = z + h, 

where z is a vector in H and h is orthogonal to L. By analogy with 
spaces of directed line segments a vector z in the decomposition is 
called the projection of v onto H and h is called the perpendicular 
from v to H. The process of obtaining a vector z from v is termed the 
projection of v onto H. If a hyperplane is given by (46.4), then n is 
the normal vector to the hyperplane. Given vectors ar 0 and n there 
is only one hyperplane containing the vector x 0 and orthogonal to the 
vector n. 


Exercises 

1. Prove that any plane distinct from the entire space 
may be given as the intersection of hyperplanes (46.9). 

2. Prove that a sum of hyperplanes is a nyperplane if and only if the hyper¬ 
planes to be added are parallel. 

3. Prove that a product of a hyperplane by a nonzero number is a hyper¬ 
plane. 

4. Under what conditions are a straight line and a hyperplane parallel? 

5. Derive the formula for the distance Detween a vector and a straight line 
given by equation (46.1). 


47. The half-space 

Associated with the concepts of a straight line 
and of the hyperplane of a body is the notion of the so-called convex 
sets. Since these are widely used in various fields of mathematics, 
we shall examine some of them. 

A set of vectors of a real vector space is said to be convex if to¬ 
gether with every pair of vectors it contains the entire line segment 
joining them. 

Examples of convex sets are a single vector, a line segment, a 
straight line, a subspace, a plane, a hyperplane and many others. 

Suppose a hyperplane is given in a real space by 

(n, z) 6=0. 
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A set of 

vectors z satisfying 



(n, z) — b < 0 

(47.1) 

or 

(n, z) — b > 0 

(47.2) 

is called 

an open half-space. Half-space (47.1) is 

negative and (47.2) 


is positive. 

Theorem 47.1. A half-space is a convex set. 
Proof. Take two vectors, x 0 and y 0 , and let 


<Di = («. * 0 ) ~ b < ^2 = («. Ho) — b. 

If z is any vector of the straight line through x 0 and y 0 , then 

z = x 0 + t (j/ 0 — x 0 ). 

When 0 ^ t ^ 1 we obtain vectors of the line segment joining x 0 
and y 0 . We have 

(n, z) - b = (Di (1 - f) + <D 2 f. (47.3) 

If <D X and <I> 2 have the same sign, i.e. x 0 and y 0 are in the same half¬ 
space, then the right-hand side of (47.3) will also have the same 
sign for all values of t satisfying 0 ^ t ^ 1. 

Thus any hyperplane divides a vector space into three noninter¬ 
secting convex sets, the hyperplane itself and two open half-spaces. 

Suppose that x 0 and y 0 are in different half-spaces, i.e. that Oj 
and 0 2 have different signs. Formal transformation of relation (47.3) 
results in the following equation: 

(n„ z) — 6 = (<D 2 — <D() (« — j— <5,7®;) • 

It follows that the straight line through x 0 and y 0 intersects the 
hyperplane. The intersection is defined by 


1 —® 2 / 0 >, 

satisfying 0 ^ t ^ 1. So 

If two vectors are in different half-spaces, then the line segment join¬ 
ing those vectors intersects the hyperplane determining the half-spaces. 

Considering this property it is easy to understand what the half¬ 
spaces of spaces of directed line segments are. In the plane the termi¬ 
nal points of vectors of a half-space are on the same side of a straight 
line and in space they are on the same side of a plane. 

Along with open half-spaces closed half-spaces are not infrequently 
considered. They are defined as sets of vectors z satisfying 

(n, x) — b ^ 0 


(47.4) 
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or 

(re, z) — b ^ 0. (47.5) 

Half-space (47.4) is nonpositive and (47.5) is nonnegative . Of course, 
■closed half-spaces are also convex sets. 

Theorem 47.2. An intersection of convex sets is a convex set. 

Proof. Obviously it suffices to consider the case of two sets £/, 
and U 2 . Let U = U { f) U % be their intersection. Take any two vec¬ 
tors x 0 and y 0 in U and denote by S the line segment joining them. 
The vectors x 0 and y 0 are in both [7, and U t . Therefore in view of 
the convexity of U 1 and U 2 the line segment S is entirely in both 
U x and U 2 , i.e. it is in U. 

This theorem plays an important part in the study of convex 
sets. In particular, it allows us to say that a nonempty set of vec¬ 
tors z, each satisfying the system of inequalities 

0 * 1 , *)—fi ^ 0 , 

(re„ z) — /, ^ 0, 


(re h , z) — f k $ 0, 

is a convex set. Such systems of inequalities are the basic element 
in the description of many production planning problems, manage¬ 
ment problems and the like. 

Exercises 

1. Prove that a set of vectors z satisfying (z, z) ^ a 

is convex. 

2. Prove that if a vector z is in hyperplane (46.8), then the vector z + n is 
in the positive half-space. 

3. Prove that if hyperplanes are given by the normed equations (44.1) 
and (44.2), then the origin is always in the nonpositive half-space. 

48. Systems of linear equations 

We again turn to the study of systems of lin¬ 
ear algebraic equations, this time, however, in connection with the 
problem of the intersection of hyperplanes. 

Consider a real or complex space K of dimension m. Suppose a 
scalar product is introduced in it. Choose some orthonormal basis 
and let re A , rej, . . ., re fc be normal vectors to hyperplanes H lt H 2 , • • • 
. . ., H h of (46.9) given by their coordinates 

**1 = ( a ll> ®12t • • •» 
n 2 — (Ugx, ^22» • • m ^Im)* 


n h — ( a klt a k2t • • •* a hm) 


(48.1) 
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in that basis. Assume that vectors 

2 = (*!> 2 2> • • •> 2 m) 


lying in the intersection of the hyperplanes are also defined by their 
coordinates in the same basis. 

In the case of a real space the scalar product of vectors in the 
orthonormal basis is the sum of pairwise products of coordinates. 
Therefore in coordinate notation system (46.9) has the following 
form: 

ffl ll 2 l + a 12 z 2 + • • • + 0im z m = &1» 

^21 z l "i - ^22 Z 2 "t - • • • "t" ^2m z m = hj, 

. (48.2> 

^fcl z l “h ^h2 z 2 "t - • * • ®fcm z m = • 

In the case of a complex space we again obtain a similar system, 
except for the fact that the coefficients of the unknowns and the 
right-hand sides are replaced by conjugate complex numbers. 

So the problem of the intersection of hyperplanes reduces to the 
familiar Section 22 problem in solving a system of linear algebraic 
equations. It is obvious that any system of equations with complex 
or real coefficients can also be investigated in terms of the intersec¬ 
tion of hyperplanes in a complex or real space P m . 

One of the main points is the investigation of a system of linear 
algebraic equations for compatibility. It is this point that deter¬ 
mines the answer to the question as to whether the intersection of 
hyperplanes is an empty set or a nonempty one. Of course, use can be 
made of the Gauss method to carry out this study. This is not always 
convenient, however. 

In studying systems of linear algebraic equations one has to deal 
with two matrices. One is made up of the coefficients of the unknowns 
and called the matrix of the system. It is as follows: 

/ a \\ a i2 • • • “i m 

I a 21 a 22 ... a 2m 


a hl a h2 • • • a hm 



The other results from adding to it a column of right-hand sidea 


d\\ d 


11 “12 


a 2i a 22 


• a lm &1 

• a 2m t >2 


\ a ki a h2 • 


• hh / 
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and is called the augmented matrix of the system. Note, in particular, 
that the rank of the matrix of the system coincides with that of the 
system of vectors (48.1). 

Theorem 48.1. (Kroneeker-Capelli). For a system of linear algebraic 
equations to be compatible it is necessary and sufficient that the rank 
of the augmented matrix of the system should equal that of the matrix 
of the system. 

Proof. We use the notation of Section 22. Up to arrangement vec¬ 
tors a lt a 2 , . . ., a m , b are columns of the matrices under consider¬ 
ation. Since the rank of a matrix coincides with that of the system 
of its column vectors, to prove the theorem it suffices to show that 
the system is compatible if and only if the rank of the system 
a 2 , a 2 , . . ., a m coincides with that of the system a v a 2 , . . ., a m , b. 

Suppose system (48.2) is compatible. This means that equation 
(22.1) holds for some collection of numbers z„ z 2 , . . ., z m , i.e. that 
b is a linear combination of vectors a u a 2 , . . ., a m . But it follows 
that any basis of the system a lt a 2 ,. . ., a m is also a basis of the sys¬ 
tem a lt a 2 , . . ., a m , b, i.e. that the ranks of both systems coincide. 

Now let the ranks of the systems coincide. Choose some basis 
from a lt a 2 , . . ., a m . It will be a basis of the system a lt a 2 , ... 

. . ., a m , b as well. Consequently, b is linearly expressible in terms 
of some of the vectors of the system a lt a 2 , . . ., a m . Since it can 
be represented as a linear combination of all vectors a lt a 2 , . . ., a m 
too, this means that system (48.2) is compatible. Thus the theorem 
is proved. 

The system of linear algebraic equations (48.2) is said to be non- 
homogeneous if the right-hand sides are not all zero. Otherwise the 
system is said to be homogeneous. Any homogeneous system is always 
compatible since one of its solutions is z 2 — z 2 = ... = z m = 0. 
A system obtained from (48.2) by replacing all the right-hand sides 
with zeros is called the reduced homogeneous system. If (48.2) is com¬ 
patible, then each of its solutions is a particular or partial solution. 
The totality of particular solutions is the general solution of the 
system. 

Using the previously obtained facts about planes and systems 
(46.6), (46.7) and (46.9) describing the intersection of hyperplanes 
we can make a number of conclusions concerning the general solu¬ 
tion of a system of linear algebraic equations. Namely. 

The general solution of a reduced homogeneous system forms in a 
space P m a subspace of dimension m — r, where r is the rank of the 
matrix of the system. Any basis of that subspace is called a fundamen¬ 
tal system of solutions. 

The general solution of a nonhomogeneous system is a plane in a 
space P m obtained by translating the general solution of the reduced 
homogeneous system to the amount of any particular solution of the 
nonhomogeneous system. 
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The difference between any two particular solutions of a nonhomoge- 
neous system is a particular solution of the reduced homogeneous system. 

Among the particular solutions of a nonhomogeneous system there is 
only one orthogonal to all the solutions of the reduced homogeneous 
system. That solution is called normal. 

For a compatible system to have a unique solution it is necessary 
and sufficient that the rank of the matrix of the system should equal the 
number of unknowns. 

For a homogeneous system to have a nonzero solution it is necessary 
and sufficient that the rank of the matrix of the system should at least 
equal the number of unknowns. 

This study of systems of linear algebraic equations used the con¬ 
cept of determinant only indirectly, mainly through the notion of 
the rank of a matrix. The determinant plays a much greater role in 
the theory of systems of equations, however. 

Let the matrix of a system be a square one. For the rank of the 
matrix of the system to be less than the number of unknowns it is 
necessary and sufficient that the determinant of the system should 
be zero. Therefore 

A homogeneous system has a nonzero solution if and only if the deter¬ 
minant of the system is zero. 

Suppose now that the determinant of a system is nonzero. This 
means that the rank of the matrix of the system equals m. The rank 
of the augmented matrix cannot be less than m. But it cannot be 
greater than m either since there is no minors of order m - f 1. Hence 
the ranks of both matrices are equal, i.e. the system is necessarily 
compatible in this case. Moreover, it has a unique solution. Thus 

If the determinant of a system is nonzero, then the system always has 
a unique solution. 

In terms of the intersection of hyperplanes this fact may be restat¬ 
ed as follows: 

If the normal vectors of hyperplanes form a basis of a space, then 
the intersection of the hyperplanes is not empty and contains only 
one vector. 

Denote by d the determinant of a system and by d t a determinant 
diflering from d only in the ;th column being replaced in it by a 
column of the right-hand sides b lt b 2 , . . ., b m . Then the unique 
solution of the system can be obtained by the formulas 

7 = 1. 2.n. (48.3) 

Indeed, let Atj denote the algebraic adjunct of an element a t j of 
the determinant of the system. Expanding dj by the ;'th column we get 
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Now substitute expressions (48.3) in an arbitrary A:th equation of 
the system 

n n n n n 

2 a M~d = ~d 2 a v 2 b i A u = j 2 2 a kJ b t A u 

i— l 1=1 1=1 1=11=1 

n n n n 

= t 2 2 b t a hi A n—-j 2 ft <2 a kJ A tj— b k- 
1 = 11=1 i=i j=i 

The inner sum in the last equation is by (40.5) and (40.8) equal to 
either d or zero according as i = k or i =/= k. 

Thus formulas (48.3) provide an explicit expression of the solu¬ 
tion of a system in terms of its elements. By virtue of uniqueness 
there is no other solution. These formulas are known as Cramer's 
rule. 

Formally, by calculating determinants it is possible to solve any 
systems of equations of the form (48.2). First, calculating the vari¬ 
ous minors of the matrix of the system and those of the augmented 
matrix we check the system for compatibility. Let it be compatible 
and let the rank of both matrices be r. We may assume without loss 
of generality that the principal minor of order r of the matrix of the 
system is nonzero. By Theorem 41.1 the last k — r rows of the 
augmented matrix are linear combinations of its first r rows. Hence 
the system 

: UllZl + • • • + a lr Z r ; 4- ffl 1 ,, + 1 Z r+1 + . .. + a im z m ~ &!» 

; <* 2 iZi + • • • + a 2T z T + a 2 _ r+1 z r+1 + ... + a 2m z m = b 2 , 

I. (48.4) 

| Url^i 4" • • • 4" a TT Z T . 4" ®r,r+l z r+| T • • • 4" Urm z m = b r 
is equivalent to system (48.2). 

As before, the unknowns z r +i, . . ., z m are called free unknowns . 
Assigning to them any values it is possible to determine all the other 
unknowns by solving the system with the martix of the principal 
minor, for example by Cramer’s rule. 

This method of solving a system is of some value only for theoret¬ 
ical studies. In practice, however, it is much more advantageous to 
employ the Gauss method described in Section 22. 

Exercises 

1. Prove that a general solution is a convex set. 

2. Prove that among all particular solutions of a nonhomogeneous system 
the normal solution is the shortest. 

3. Prove that a fundamental system is a collection of any m — r solutions 
of the reduced homogeneous system for which the determinant made up of the 
values of the free unknowns is nonzero. 

4. It was noted in Section 22 that small changes in coordinates may result 
in the violation of linear dependence or independence of vectors. What conclu¬ 
sions can be drawn from this concerning the hyperplane intersection problem? 






CHAPTER 6 


The Limit in Vector Space 


49. Metric spaces 

One of the basic concepts of mathematical anal¬ 
ysis is the notion of limit. It is based on the fact that for the points 
of the number line the concept of “closeness” or more precisely of 
the distance between points is defined. 

Comparison for “closeness” can be introduced in sets of a quite 
diSerent nature. We have already defined in Section 29 the distance 
between vectors of vector spaces with scalar product. It was found 
that it possesses the same properties (29.5) as those the distance 
between points of the number line has. The distance between vectors 
was defined in terms of a scalar product which in turn was intro¬ 
duced axiomatically. 

It is natural to attempt to introduce axiomatically the distance 
itself by requiring that properties (29.5) should necessarily hold. 

Notice that many fundamental facts of the theory of limit in 
mathematical analysis are not associated with the fact that alge¬ 
braic operations are defined for numbers. We therefore begin by ex¬ 
tending the concept of distance to arbitrary sets of elements that are 
not necessarily vectors of a vector space. 

A set is said to be a metric space if every pair of its elements is 
assigned a nonnegative real number called a distance, with the fol¬ 
lowing axioms holding: 

(1) p (x, y) = p (y, x), 

(2) p (x, y) > 0 if x ^ y\ p (x, y) — 0 if x = y, 

(3) P (x, })<p (x, z) + p (z, y) 


for any elements x, y and z. These axioms are called axioms of a 
metric, the first being the axiom of symmetry and the third the tri¬ 
angle axiom. 

Formally any set of elements in which the equality relation is 
defined can be converted into a metric space by setting 


P 



0 if x = y, 
1 if x=£y. 


It is easy to verify that all the axioms of a metric hold. 


(49.1) 
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The vector x 0 of a metric space X is said to be the limit of a se¬ 
quence {x„} of elements ij, x 2 , . . x n , . . . of X if the sequence 
of distances p (x 0 , ^i). P (^oi * 2 ). • • •• P (^o> £ n ). • • • converges to 
zero. In this case we write 


or 



lim x n = x t 

n-* oo 


and call the sequence {x„} convergent in X or simply convergent. 

Notice that the same sequence of elements of the same set X may 
be convergent or nonconvergent depending on what metric is intro¬ 
duced in X. Suppose, for example, that in a metric space X some 
convergent sequence {x„} is chosen consisting of mutually distinct 
elements. We change the metric in X by introducing it according to 
(49.1). Now {x n } is no longer convergent. Indeed, suppose x„ -*-x', 
i.e. p (x„, x') ->-0. With the new metric this is possible if and only 
if all elements of (x n ), except for a finite number of them, coincide 
with x'. The contradiction obtained confirms the above assertion. 

The following two properties are shared by any convergent se¬ 
quence: 

If {x„} converges, then so does and has the same limit any of its 
subsequences. A sequence may have no more than one limit. 

The first property is obvious. Suppose {x„} has two limits, x 0 
and y 0 . Then for any arbitrarily small number e > 0 we can find a 
number N such that 


P (x 0 , *n) < j , P (i/ 0 . *n) < Y 


for every n > N. But from this, using the triangle axiom, we find 
P (*o. Uo) < P (*o. *n) + P ( x n, V o) < e. 


By virtue of the arbitrariness of e this means that p (x 0 , y 0 ) — 0, 
i.e. Xq = y 0 . 

A sphere S (a, r) in a metric space X is the set of all elements 
x 6 X satisfying the condition 

p (a, x) < r. (49.2) 


An element a is called the centre of the sphere and r is its radius. 
Any sphere with centre in a is a neighbourhood of a. A set of elements 
is said to be bounded if it is entirely in some sphere. 

It is easy to see that x„ is the limit of (x„} if and only if any neigh¬ 
bourhood of x 0 contains all the elements of {x„} beginning with 
some index. 

In a metric space it is possible to introduce many of the other ma¬ 
jor concepts dealt with in number sets. Thus if a set At c X is 
given, then an element x 6 X is said to be a cluster point (or limit point 
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or accumulation point) of that set if any neighbourhood of x contains 
at least one element of M distinct from x. The set obtained by join¬ 
ing to M all its cluster points is called the closure of M and desig¬ 
nated M. The set M is said to be closed if M = M. 

Consider the cluster points of sphere (49.2). We show that they 
all satisfy the condition 

p (a, x) ^ r. (49.3) 

Indeed, suppose there is at least one cluster point x for sphere 
(49.2) such that p (a, x') >■ r. By the definition of a cluster point 
any neighbourhood of x must contain at least one element of (49.2) 
distinct from x . But a neighbourhood with radius 0.5 (p (a, x') — r) 
clearly contains no such element. Accordingly: 

A set S (a, r) of all elements x satisfying (49.3) is a closed sphere. 


Exercises 

1. Prove that if x, then p (x n , z)-*- p (x, z) 

for any element z. 

2. Will the set of all real numbers be a metric space if for numbers x and y 
we set 

p (*, y) = arctan | x — y |? 

3. Can a set consisting of a finite number of elements have cluster points? 

50. Complete spaces 

A sequence {x„} of elements of a metric space 
is said to be a fundamental or Cauchy sequence if given any number 
e> 0 we can find N such that p (x„, x m ) < e for n, m > N. 

Any fundamental sequence is bounded. Indeed, given e choose N 
according to the definition and take a number n 0 > N. All ele¬ 
ments of the sequence, beginning with x„ 0 , are clearly in a sphere 
with centre x„ 0 and of radius e. The entire collection of the elements, 
however, is in a sphere with centre x„ 0 and of radius equal to the 
maximum of the numbers 

8, p(Xj, x n> ), ..., p (x n ,_,, ^n,). 

If a sequence is convergent, then it is fundamental. Let a sequence 
{x n } converge to x 0 . Then given any e > 0 there is N such thai 

p (x„, x 0 ) < 

for n >■ N. By the triangle axiom 

P (*n, X m ) < p (x n , x 0 ) + p (x 0 , x m ) < e 
for n, m > iV, which precisely means that {x„} is fundamental. 
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For the set of all reals the converse is also true. That is, any fun¬ 
damental sequence is convergent. In general, however, this is not 
true, which is exemplified by a metric space with at least one cluster 
point eliminated. 

A metric space is said to be complete if any fundamental sequence 
in it is convergent. 

In complete metric spaces a theorem holds similar to the theorem 
on embedded line segments for real numbers. Given some sequence 
of spheres, we shall say that they are embedded into one another if 
every subsequent sphere is contained inside the preceding one. 

Theorem 50.1. In a complete metric space X let {5 (a„, e„)} be a 
sequence of closed spheres embedded into one another. If the sequence 
of their radii tends to zero, then there is a unique element in X which 
is in all those spheres. 

Proof. Consider a sequence {a n }. Since S (a n+p , e n+J> ) cr S (o„, e„) 
for any p ^ 0, we have a n+p £ S ( a„ , e„). Consequently, 

P (®n + p > ^ 

which implies that {a„} is fundamental. 

The space X is complete and therefore {a„} tends to some limit a 
in X. Take any sphere S (a h , e h ). This sphere contains all terms 
of {a,,}, beginning with a h . By virtue of the closure of the spheres 
the limit of {a n } is also in S ( a h , t h ). Thus a is in all the spheres. 

Assume further that there is another element, b, that is also in 
all the spheres. By the triangle axiom 

P ( a , *>)< P (a, a n ) + P (« n . b) < 2e„. 

Since e„ may be taken as small as we like, this means that p (a, b) = 
= 0, i.e. that a = b. 

The most important examples of complete spaces are the sets of real 
and complex numbers. We assume that the distance between numbers 
coincides with the absolute value of their difference. The complete¬ 
ness of the set of reals is proved in the course of mathematical analy¬ 
sis. We show the completeness of the set of complex numbers. 

Assume that complex numbers are given in algebraic form. The 
distance between numbers 


z = a -f 

ib. 

v = c + id 


is introduced by the rule 




P (*, 

v) = 

\z — v\. 

(50.1) 

where 




1 * — v | 2 = 

{a - 

c) 2 + (b - d)\ 

(50.2) 


It is obvious that the axioms of a metric hold. 


ii* 
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Consider a sequence {z h = a h + tb>,} of complex numbers. Let it 
be fundamental. Given e > 0 there is N such that for all n, m> N 

I z„ — 2m I < e. 

From (50.2) it follows that 

I | < e, I b n — b m | < e, (50.3) 

i.e. that {a ft } and {fc h } are also fundamental. By virtue of the com¬ 
pleteness of the set of reals there are a and b such that 

—*■ a, bh —*■ b. 

Proceeding in (50.3) to the limit we get 


On setting 
we find that 


| a„ — a | < e, | b n — b | < e. 
z = a + ib, 

P (z„, V 2e 


for all n > N. But this means that the fundamental sequence {z ft } 
is convergent. 

As a consequence, {z h = a h + iby} converges to a number z = 
= a + ib if and only if {a*} and {fc h } converge to a and b. 

The complete space of complex numbers has much in common with 
the space of real numbers. In particular, any bounded sequence of 
complex numbers has a convergent subsequence. Indeed, this is 
true for any bounded sequence of reals. It is also obvious that the 
boundedness of {z h = a h + ib h ) implies the boundedness of {a*} 
and {&*}. Since {a ft } is bounded, it has a convergent subsequence 
{a\ fc }. Consider a sequence {&v h }« It is bounded and therefore it 
too has a subsequence }■ It is clear that {a V/i } will be con¬ 
vergent. Consequently, so is (z v . }. 

"n 

In the complex space, just as in the real one, the concept of infi¬ 
nitely large sequence can be introduced. Namely, a sequence {z h } is 
said to be infinitely large if given an arbitrarily large number A 
we can find N such that | z h | >• A for every k > N. It is obvious 
that w'e can alw'ays choose an infinitely large subsequence of any 
unbounded sequence. 


Exercises 

1. Will the set of all real numbers be a complete 
space if for numbers x and y we set 

p (x, y) = arctan | x — y |? 

2. Prove that any closed set of a complete space is a complete space. 

3. Will any closed set of an arbitrary metric space be necessarily a com¬ 
plete space? 

4. Construct a metric such that the set of all complex numbers is not a com¬ 
plete space. 
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51. Auxiliary inequalities 


We establish some inequalities to be used in 
our immediate studies. Take a positive number a and consider the 
exponential function y = a' (Fig. 51.1). Let x, and x 2 be two dis¬ 
tinct reals. Draw a straight line through points with coordinates 
(x,, a*i) and (x 2 , a* 2 ). Taking into account the properties of the 


exponential function we conclude that 
if the independent variable on the closed 
interval lx,, x 2 ] is changed, none of its 
points will lie higher than the points of 
the constructed straight line. 

Now let x, = 0 and x 2 = 1. Then 
the equation of the straight line under 
consideration will be ij = ax + (1 —x). 
Consequently, 

a* ^ ax + (1 — x) (51.1) 



Fig. 51.1 


for 0 ^ x ^ 1. 

We shall say that positive numbers p and q are conjugate if they 
satisfy the relation 



(51.2) 


It is clear that p, q > 1. 

For any positive numbers a and b, the number a p b~ q will also 
be positive and it can be taken as a of (51.1). If we assume x = p~ l , 
then 1 — x = q~ x . Now from (51.1) we get 

ab ^T + T (51 ' 3) 

for any positive a and b and conjugate p and q. It is obvious that 
in fact this inequality holds for all nonnegative a and b. 

Consider two arbitrary vectors x and y in the space R n or C„. 
Let these vectors be given by their coordinates 


x = (x,, x 2 , . . ., x n ), 

y = ii/u i/z. • • •. y n )- 

We establish the so-called Holder's inequality 

2 I x h'Jk I < (2 I x h 2 I *Jh l , ) 1/ "- (51.4) 

A = 1 fc=l h=l 


Notice that if at least one of the vectors x and y is zero, then Hol¬ 
der’s inequality is obviously true. It may therefore be assumed that 
x 0 and y ^ 0. Let the inequality hold for some nonzero vec- 
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tors x and y. Then it does for vectors Xx and py, with any X and p. 
Therefore it suffices to prove it for the case where 

2 l*J p = 2 \Uk\ 9 = l- (51.5) 

fc=l h = l 

Now putting a — \ x h | and b = | y k | in (51.3) and summing over 
k from 1 to re we get in view of (51.2) and (51.5) 

2 I Wh !<!• 

k = l 

But this is Holder’s inequality for (51.5). 

We now proceed to prove Minkowski's inequality which, for any 
vectors x and y from R„ or C„, says that 

(2 i * h +y h r) 1/p <(S i*ki p ) 1/p +(2 iy h r) ,/p (5i.6) 

fc=l fc=I k=l 

for every p 1. 

Minkowski’s inequality is obvious for p = 1, since the absolute 
value of the sum of two numbers is at most the sum of their absolute 
values. Moreover, it clearly holds if at least one of the vectors x 
and y is zero. Therefore we may restrict ourselves to the case p > 1 
and z^O. We write the identity 

(1*1+ IM) P = (I* 1+ ]b I)”- 1 | a | 

+ (I a I + I b I)’" 1 | b |. 

Setting a = x h and b = y h and summing over k from 1 to re we get 
2 (I x h I + I l Jk l) P 

k=l 

= 2 (I x h I + I l/h l) P_1 ! x k I + 2 (I x h I + I !/h l) P_l I l Jh I • 

ft=l ft = 1 

Applying to each of the two sums at the right Holder’s inequality 
and considering that (p — 1) g = p we get 

2 (l x k l +1 i/k l) p 

k«= 1 

<(2 (i*ki + i'/ki) p ) v M(2 i*ki p ) 1/p +(2 ift r) 1/p ). 

k=l fc=l k=l 

On dividing both sides of the inequality by 
(2 (I *k I +1 ft l) p ) ,/ ' 7 . 
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we find that 

h=.l fe = ! >1=1 

from which we at once obtain inequality (51.6). 


Exercises 

1. Derive the Cauchy-Buniakowski-Schwarz inequal¬ 
ity from Holder’s. 

2. Study Holder’s inequality for p-*- oo. 

3. Study Minkowski's inequality for p-*- oc. 

52. Normed spaces 

We have arrived at the concept of metric 
space by concentrating on a single property of a set, that of distance. 
Similarly, by concentrating on operations in a set we have arrived 
at the concept of vector space. We now discuss vector spaces 
with a metric. 

It is obvious that if the concept of distance is in no way connected 
with operations on elements, it is impossible to construct an inter¬ 
esting theory whose facts would join together algebraic and metric 
notions. We shall therefore impose additional conditions on the 
metric introduced in a vector space. 

In fact we have already encountered metric vector spaces. These 
are, for example, a Euclidean and a unitary space with metric 
(29.4). The need for such a metric, however, arises not always by 
far. Introducing a scalar product actually means introducing not 
only distance between elements but also angles between them. In a 
vector space it is most often required to introduce only an acceptable 
definition of distance. The most important vector spaces of this 
kind are the so-called normed spaces. 

A real or complex vector space X is said to be a normed space if 
each vector x £ X is assigned a real number || x || called the norm 
of a vector x, the following axioms holding: 

(1) || x || > 0 if x # 0, || 0 || = 0, 

(2) n ** H = m H x ii, 

(3) || x + y II < II x || + || y || 

for any vectors x and y and any number X. The second axiom is 
called the axiom o/ the absolute homogeneity of a norm and the third is 
the triangle inequality axiom. 

From the axiom of the absolute homogeneity of a norm it follows 
that for any nonzero vector x we can find a number X such that the 
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norm of a vector Xx equals unity. To do this it suffices to take X = 
= || x || _1 . A vector whose norm equals unity will be called a 
normed vector. 

The triangle inequality for a norm yields one useful relation. 
We have || x || ^ || y || + || x — y || for any x and y. Hence ||x|| — 

— II l J II < II 1 — y II- Interchanging x and y we find || y || — 

— || x || ^ || x — y ||. Therefore 

III * II — IIJ/ III < II x — y ||. (52.1) 

A normed space is easy to convert into a metric one by setting 

P (*. y) = II x — y ||. (52.2) 

Indeed, p (x, y) = 0 means || x — y || = 0, which according to 
axiom (1) means that x = y. The symmetry of the distance intro¬ 
duced is obvious. Finally, the triangle inequality for a distance is a 
simple consequence of the triangle inequality for a norm. That is, 
p (x, y) = || x - y || = || (x — z) + (z — y) || 

< || x — z || + || z — y || = p (x, z) + p (z, y). 

Notice that 

II * II = P (*, 0). (52.3) 

Metric (52.2) defined in a vector space possesses also the following 
properties: 

p (x + z, y + Z ) = p (x, y) 

for any x, y, z 6 X, i.e. distance remains unaffected by a translation 
of vectors and 

p (Xx, Xy) = IX Ip (x, y) 

for any vectors x, y £ X and any number X, i.e. distance is an abso¬ 
lutely homogeneous function. 

If in a metric vector space X some metric satisfies these two addi¬ 
tional requirements, then X may be regarded as a normed space if 
a norm is defined by equation (52.3) for any x 6 X. 

Taking into account the relations of Section 29 it is easy to estab¬ 
lish that a vector space with scalar product is a normed space. It 
should be assumed that the norm of a vector is its length. 

It is possible to give other examples of introducing a norm. In a 
vector space let vectors be given by their coordinates in some basis. 
If x = (x lt x 2 , . . ., x„), then we set 

II* Up = (2 I * ft I') 1 *. (52.4) 

h=l 

where p ^ 1. That the first two axioms hold for a norm is obvious 
and the third axiom is seen to hold from Minkowski’s inequality 
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(51.6). Most commonly used are the following norms: 

II*Hi=* 2 l*h|. I! 1 II 2 = (2 U),l 2 ) 1/2 . II*IU= max |i„|. (52.5) 

* = 1 * = 1 isgdsSn 

The second of the norms is often called a Euclidean norm and desig¬ 
nated || x || E . 

In what follows we shall consider only normed spaces with metric 
(52.2). The convergence of a sequence of vectors in such a metric is 
called the convergence in the norm, the boundedness of a set of vectors 
is called the boundedness in the norm, etc. 

Exercises 

1. Prove that there is a sequence of vectors whose 
norms form an infinitely large sequence. 

2. Prove that for any numhers X ( and vectors e t 

II 2 x i e > II < 2 1 *11 ii«i 11. 

i = l i~1 

3. Prove that if x n -+ x, y n -*- y and X n -* X, then 

II *n II— II x II, x„+y n -*-x + y, X„x„-* Xx. 

53. Convergence in the norm and 
coordinate convergence 

In a real or complex finite dimensional vector 
space it is possible to introduce another concept of convergence be¬ 
sides the convergence in the norm. Consider a space X and let e lf 


e t , .. e n be its basis. For any sequence 

of X there are expansions 

{x m } of vectors 

n 

x = 2 tk m 'e h . 

(53.1) 

*=i 


If for the vector 


X 0 = 2 £h* e k 

(53.2) 

h=l 


there are limiting relations 


lim E ( * m) =sr 

(53.3) 


m — ao 


for every k — 1, 2, .... n, then we shall say that there is a coordi¬ 
nate convergence of {x m } to a vector x 0 . 

Coordinate convergence is quite natural. If two vectors are "close”, 
then it may be assumed that so should the corresponding coordi- 
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nates in the expansion with respect to the same basis. Finite dimen¬ 
sional normed spaces are noteworthy for the fact that the nolions of 
convergence in the norm and of coordinate convergence are equiva¬ 
lent in them. 

It is easy to show that coordinate convergence implies conver¬ 
gence in the norm. Indeed, let (53.3) hold. Using the axioms of the 
absolute homogeneity of a norm and of the triangle inequality we 
conclude from (53.1) and (53.2) that 


h = 1 fc=l 


\e h 


The proof of the converse is essentially more involved. 

Lemma 53.1. // in a finite dimensional normed space a sequence oj 
vectors is norm bounded, then the numerical sequences of all the coor¬ 
dinates in the expansion of vectors with respect to any basis are bound¬ 
ed too. 

Proof. Let each vector of a sequence (x m ) be represented in the 
form (53.1). We introduce the notation 

°m= 2 IlH 

h = l 


and prove that {a m } is bounded. 

Suppose this is not the case. Then we may choose an infinitely 
large subsequence {a m }. Set 

1 


y P 



X 


Tup’ 


if 


then of course 


for every k = 1, 2, 


n 

y n = 2 T li P>e h * 

h = 1 



n and every m p and we get 


S K p) ! = i- 

k =1 


(53.4) 


Sequences {ti' p> } are bounded since by (53.4) | r| <p> | ^ 1. Hence 
it is possible to choose a subsequence of vectors {y Pl } such that 
{t| < p ; ) } is convergent, i.e. 

linn T|<P'> = r), 

Pi-ao 
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for some number Tjj. It is possible to choose in turn a subsequence 
{y,, 4 } of {t/p,} such that 

lim if/** - t | 2 

p.-oo 

for some number t| 2 . As before 

lim q</*> = q,. 

p.-“ 

Continuing the process we choose a subsequence {y,> n } of {y,,} 
such that there are limits 

lim T]* Pn> = r|ft (53.5) 

p„-°° 

for every k = 1, 2, . . n. By (53.4) 

2 hkl = t- (33.6) 

fc=i 

Since coordinate convergence implies convergence in the norm, 
limiting relations (53.5) mean that 

lim || yp n —y II = 0, (33.7) 

p n -~ 


where 


y= 2 line*. 

h=l 


By virtue of (53.6) y must not equal zero. On the other hand, 


I'm H = 


Pn 


o, 


’/> n 


since the sequence (x mp } is norm bounded and (a mp } is infinitely 

large. Hence it follows from (53.7) that || y || = 0, i.e. that y is 
a zero vector. This contradiction proves the lemma. 

Theorem 53.1. In a finite dimensional normed space convergence in 
the norm implies coordinate convergence. 

Proof. Let (x m ) be a sequence of vectors converging in the norm 
to a vector x 0 . It is obvious that it suffices to consider the case where 
x 0 = 0 and (x m ) contains no zero vectors. We represent vectors x m 
as (53.1). The sequence of vectors 

l 

Vm ~ ll*mll Xm 



172 


The Limit in Vector Spaco 


[ChJ 


will be norm bounded and by Lemma 53.1 the sequences of numbers 


il 


(no 

k 


sir* 


II x m II 


must be bounded for every k = 1, 2, .... n. Since || x m ||-»- 0, 
this is possible if and only if |j, mi 0 for every k. But this means that 
there is coordinate convergence of the sequence {x m } to the vec¬ 
tor x 0 . 

Coordinate convergence can be efficiently used in theoretical 
studies, but in practical applications it is more convenient to use 
convergence in the norm. This is mainly because in the study of vec¬ 
tor spaces of high dimension it is hard to deal with a large number 
of coordinate sequences. Besides w'e do not always know at least 
one basis. But even if w'e do, its use most often results in unjustly 
lengthy calculations. 


Exercises 

1. Is the requirement that a space should be finite 
dimensional essential in proving the equivalence of the two kinds of con¬ 
vergence? 

2. Prove that if some set of vectors of a finite dimensional space is bounded 
in one norm, then it will be bounded in any other norm. 

3. Prove that if in a finite dimensional space x in one norm, then 

x n ->- x in any other norm. 


54. Completeness of normed spaces 

Finite dimensional normed spaces are spaces 
where many statements similar to those associated w'ith the concept 
of limit in number sets are true. Consider some of them. 

Lemma 54.1. It is possible to choose a subsequence of any bounded 
sequence of vectors of a finite dimensional normed space, convergent in 
that space. 

Proof. Let {x m } be an arbitrary norm bounded sequence. Repre¬ 
sent vectors x m as (53.1). By Lemma 53.1 sequences {£<">} are 
bounded for every k = 1, 2, . . ., n. Choose, in the same w'ay as in 
the proof of Lemma 53.1, a subsequence {x m } of {x m } such that 

there are limiting relations £ < ft mn> -*■ ii 8> for every k. It follows that 
(im n } converges in the norm to vector (53.2). 

This lemma is similar to the well-known Bolzano-Weierstrass 
theorem of mathematical analysis. It is of great importance for 
studies of any finite dimensional normed spaces. As an illustration, 
we prove some statements. 

Theorem 54.1. Any finite dimensional normed space is complete. 
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Proof. Let {x m } be a fundamental sequence. It is bounded. Choose 
a convergent subsequence {x mf) } and denote by x 0 its limit. We 
have 

II x m — X 0 ||< || X m — X mj) || + || X mn - X 0 ||. 

Take an arbitrary number e > 0. Since {x m } is fundamental, 
there is N j such that || x m — x mn || < e/2 for m, m„ > N 1 . Since 
the sequence |i m|l ) converges to x 0 , there is N 2 such that || x TOn — 
— x 0 || < e/2 for m n > N 2 . If N is the maximum number of 
and N 2 , for m > N 

II — lo II < e- 

The number e is arbitrary. Hence the fundamental sequence {x ln } 
converges in the norm to the vector x 0 . 

Lemma 54.2. Any finite dimensional subspace X 0 of a normed space 
X is a closed set. 

Proof. Consider in a normed space X a finite dimensional subspace 
X 0 . Let a vector x £ X be cluster point for X 0 . This means that 
there is a sequence {x m } of vectors of X„, distinct from x, such that 
|| x m — x || —0. The sequence {x m } is bounded and therefore it is 
possible to choose a subsequence (x mp ) of {x m }, converging, in view 
of the completeness of X 0 , to some vector x„ £ X 0 . Now we have 

II X — x 0 ||< II x — x mp II + II x mp — x 0 II 0, 
i.e. x = x 0 . 

Lemma 54.3. Let X be a normed space and let X 0 be its finite dimen¬ 
sional subspace distinct from it. There is a normed vector x X 0 such 
that || x — x 0 ||^ 1 for any vector x 0 6 •XV 

Proof. The subspace X„ does not coincide with X and therefore 
there is a vector x' $ X 0 . Since X 0 is closed, we have 

inf || x' — x 0 1| = d > 0. (54.1) 

By the definition of the greatest lower bound there is a vector x^> in 
X 0 for w'hich 

d<||*'-*«>||< T ^ =r . 

The sequence {x^ h >} is bounded. Choose a subsequence {xJ^V} of it, 
converging, in view of the completeness of X 0 , to some vector x' £ 
£ X 0 . For that vector obviously 

||x'-x # '|| = d. 

x = -j(x'-x' n ). 


Set 


(54.2) 
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It is clear that || x || = 1. Moreover, if x 0 6 ^o> then by (54.1) 







since the vector x' 0 -f dx 0 is in X 0 . 

Incidentally, we have proved that the lower bound (54.1) is 
reached on at least one vector x' 6 X 0 . In || x — x 0 || ^ 1 equality 
clearly holds for x„ = 0. 

To conclude, Lemma 54.1, which plays so great a part in finite 
dimensional spaces, holds in no infinite dimensional space. Namely, 
we have 

Lemma 54.4. If it is possible to choose a convergent subsequence of 
any bounded sequence of vectors of a normed space X, then X is finite 
dimensional. 

Proof. Suppose the contrary. Let X be an infinite dimensional 
space. Choose an arbitrary normed vector x 1 and denote by L x its 
span. By Lemma 54.3 there is a normed vector x 2 such that || x 2 — 
—Xj ll^t 1. Denote by L 2 the span of x t and x 2 . Continuing the rea¬ 
soning we find a sequence {x n } of normed vectors satisfying || x„ — 
—x h ||^ 1 for every k<Zn. Hence it is impossible to choose any con¬ 
vergent subsequence of {x„}. This contradicts the hypothesis of the 
lemma and therefore the assumption that X is infinite dimensional 
was false. 


Exercises 

1. Prove that a plane in a normed finite dimensional 

space’ is a closed set. 

2. Prove that a set of vectors x of a finite dimensional space satisfying 
|| * l| < a is a closed set. 

3. Prove that in a closed bounded set of vectors of a finite dimensional 
space there are vectors on which both the lower and the upper bound of values 
oi any norm are attained. 

4. Prove that given any two norms, || x Hi and || x ||u, in a finite dimen¬ 
sional space there are positive a and p such that 

« II * III < II * llll < P II * III 

for every vector x. The numbers o and p are independent of x. 


55. The limit and computational processes 

In complete metric spaces the concept of limit 
is widely used in constructing and justifying various computational 
processes. We consider as an example one method of solving systems 
of linear algebraic equations. 

Given a system of two equations in two unknowns, assume that it 
is compatible and has a unique solution. For simplicity of presents- 
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tion suppose that all coefficients are real. Each of the equations- 

a u x "H ^i2 y — hi 

a n x "t“ a 2ill = /2 


of the system defines in the plane a straight line. The point M of the 
intersection of the straight lines gives a solution of the system 
(Fig. 55.1). 

Take a point M 0 lying on none of the straight lines in the plane. 
Drop from it a perpendicular to either straight line. The foot M l of 
the perpendicular is closer to M 
than M 0 is, since a projection is 
always smaller than the inclined 
line. Drop then a perpendicular 
from M, to the other straight line. 

The foot M 2 of that perpendicu¬ 
lar is still closer to the solution. 

Successively projecting now onto 
one straight line now onto the other, 
we obtain a sequence {M h ) of 
points of the plane converging to M. 

It is important to note that the 
sequence constructed converges for 
any choice of the initial point M 0 . 

This example suggests how to construct a computational process 
to solve a system of linear algebraic equations of the general form 
(48.2). We replace the problem by an equivalent problem of finding 
vectors of the intersection of a system of hyperplanes (46.9). Sup¬ 
pose the hyperplanes contain at least one common vector and assume 
for simplicity that the vector space is real. 

Choose a vector v 0 and project it onto the first hyperplane. Project 
the resulting vector Uj onto the second hyperplane and so on. This 
process determines some sequence {u p }. Let us investigate that 
sequence. 

Basic to the computational process is projecting some vector v p 
onto the hyperplane given by equation (46.8). It is clear that v p+1 
satisfies that equation and is related to v p by the equation 

^p+i = Vp t tn 



for some number t. Substituting v p+1 in (46.8) determines t. From 
this we get 


'p+i 


= y P+(- 


Vp) 

(n, n) 


)"■ 


This formula says that all vectors of the sequence {u p } are in the 
plane obtained by translating the span L (n lt n 2 , . . ., n k ) to the 
amount of a length of the vector v 0 . But all vectors which are in th e 
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intersection of hyperplanes (46.9) are in the plane obtained by trans¬ 
lating the orthogonal complement L l ( n 1 , n 2 , .... n k ). There is 
a unique vector z 0 that is in both planes. 

If we prove that some subsequence of {u p } converges to some vector 
that is in hyperplanes (46.9), then by the closure of a plane it is 
to z 0 that it converges. Moreover, so does the entire sequence {u p }. 

For any r vectors z 0 — v, + 1 and i> r+1 — v r are orthogonal and 
therefore by the Pythagorean theorem 

p 2 ( z 0. *V) = P 2 (Zo- t>r+l) + P 2 (l>r. t>r+l)- 
Summing the equations obtained over r from 0 to p — 1 we find 

p-i 

P 2 (zo. i>o) = P 2 (z<» y p )+ 2 P 2 (IV. V r+i ). 

r=0 

Consequently 

p -1 

2 P 2 ( y r. y r+i)^P 2 ( z o> l>o) 

r=0 

from which we conclude that 

p (v p , v p+1 ) 0. (55.1) 

Denote by H r the hyperplane in the rth row of (46.9). It is clear 
that the distance from v p to 77 r is not greater than the distance be¬ 
tween v p and any vector of H r . By the construction of {i> p }, among 
any k consecutive vectors of {y p } there is necessarily a vector that is 
in any of the hypcrplanes. Using the triangle inequality and the 
limiting relation (55.1) we get 

p (v p , // r )< p (v p , 

y P + l) P ( y p4l» V P+i) "1~ 

. . . + p (yp+fc-!, v p+k ) 0 (55.2) 

for every r = 1, 2, . . ., k. 

The sequence {v p } is obviously bounded. Choose some convergent 
subsequence of it. Let that subsequence converge to a vector z'. 
Proceeding in (55.2) to the limit we find that 

p(z', H r ) = 0 

for every r = 1, 2, . . ., k. But, as already noted earlier, the vector 
Zj must coincide with z 0 . Consequently, (u p ) converges to z 0 . 

Exercises 

1. Were the concepts of completeness and closure actu¬ 
ally used in the above investigation? 

2. How are other solutions of the system to be found if there are any? 

3. How will the process behave itself if the system is incompatible? 
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Matrices and Linear Operators 


56. Operators 

A major point in creating the foundations of 
mathematical analysis is the introduction of the concept of func¬ 
tion. By definition, to specify a function it is necessary to indicate 
two sets, X and Y, of real numbers and to formulate a rule assign¬ 
ing to each number x 6 X a unique number y £Y. That rule is 
a single-valued function of a real variable x given on the set X. 

In realizing the general idea of functional dependence it is not at 
all necessary to require that X and Y should be sets of real numbers. 
Understanding by X and Y various sets of elements we arrive at the 
following definition generalizing the concept of function. 

A rule assigning to each element x of some nonempty set X a unique 
element y of a nonempty set Y is called an operator. A result y of 
applying an operator A to an element x is designated 

y = A (x), y = Ax (56.1) 

and A is said to be an operator from X to Y or to mapjX into Y. 

The set X is said to be the domain of A. An element y of (56.1) is 
the image of an element x and x is the inverse image of y. The collec¬ 
tion T A of all images is the range (or image) of A. If each element 
y £Y has only one inverse image, then operator (56.1) is said to 
be 1-1. An operator is also called a mapping , transformation or 
operation. 

In what follows we shall mainly consider the so-called linear opera¬ 
tors. Their distinctive features are as follows. First, the domain of 
a linear operator is always some vector space or linear subspace. 
Second, the properties of a linear operator are closely related to 
operations on vectors of a vector space. As a rule in our study of 
linear operators we shall assume that spaces are given over a field 
of real or complex numbers. Unless otherwise stated, operator will 
mean linear operator. In the general theory of operators linear opera¬ 
tors play as important a part as the straight line and the plane do 
in mathematical analysis. That is why they require a detailed study. 

Let X and Y be vector spaces over the same field P. Consider an 
operator A whose domain is X and whose range is some set 
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in Y. The operator A is said to be linear if 

A (au + Pu) = aAu $Av (56.2) 

for any vectors u, v 6 X and any numbers a, P 6 P- 

We have already repeatedly encountered linear operators. Accord¬ 
ing to (9.8) a linear operator is the magnitude of a directed line 
segment. Its domain is the set of all directed line segments of an 
axis and its range is the set of all real numbers. Also a linear opera¬ 
tor is by (21.2) an isomorphism between two vector spaces. Fix in 
a vector space with a scalar product some subspace L. We obtain 
two linear operators if each vector of the space is assigned either its 
projection onto L or the perpendicular from that vector to L. This 
follows from (30.5) and (30.6). 

An operator assigning to each vector x of X the zero vector of Y 
is obviously a linear operator. It is called the zero operator and 
designated 0. So 

0 = Ox. 

Assigning to each vector x 6 X the same vector x yields a linear 
operator E from X to X. This is called the identity or unit operator. 
By definition 

x — Ex. 

Let A be some operator from X to Y. We construct a new opera¬ 
tor B according to the prescription Bx — —Ax. The resulting 
operator B is also a linear operator from X to Y. It is said to be 
opposite to A. 

Finally fix an arbitrary number a and assign to each vector x 6 X 
a vector ax 6 X. The resulting operator is certainly a linear opera¬ 
tor. It is called a scalar operator. When a = 0, we obtain a zero 
operator, and when a — 1, we obtain an identity operator. 

We shall soon describe a general method for constructing linear 
operators, and now we point out some of their characteristic fea¬ 
tures. By (56.2) 

v v 

A (2 a,x,) = 2 a,Ax, 

i=l & 

for any vectors x, and any numbers a,. In particular, it follows that 
any linear operator A transforms a zero vector into a zero vector, i.e 

0 = AO. 

The range T A of a linear operator A is a subspace of Y. If z — Au 
and » = Av, then a vector az + p w is clearly the image of a vector 
au + pv for any numbers a and p. Hence the vector az + p w is 
in the range of A. The dimension of T A is called the rank of A and 
denoted by r A . 
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Along with T A consider the set N A of all vectors x 6 X satisfying 

Ax = 0. 

It is also a subspace and is called the kernel or null space of A. The 
dimension n A of the kernel is called the nullity of A. 

The rank and nullity are not independent characteristics of a lin¬ 
ear operator A. Let X have a dimension m. Decompose it as a direct 
sum 


X = N A + M A ' (56.3) 

where N A is the kernel of A and M A is any complementary subspace. 
Take a vector x 6 X and represent it as a sum 

x = x n + X M i 

where x N 6 Na and x M £ M A . If y = Ax, then since A is linear 
and Ax K = 0 

y = 

Hence any vector from T A has at least one inverse image in M A . 

In fact, that inverse image is unique in M A . Suppose for some 
vector y 6 T A we have two inverse images, x' M , x" M £ M A . Since M A 
is a subspace, x' M — x"m £ M A . Since x' M and x" M are the inverse 
images of the same vector y, x' M — x' M 6 N A . Subspaces M A and 
N A have only a zero vector in common. Therefore x' M — x" M — 0, 
i.e. ijj = x M . 

Thus the operator A establishes a 1-1 correspondence between the 
vectors of T A and M A . By virtue of the linearity of A this corre¬ 
spondence is an isomorphism. Hence the dimensions of T A and M A 
coincide and are equal to r A . Now it follows from (56.3) that 

r A + n A = m. (56.4) 

Note that the linear operator A establishes an isomorphism be¬ 
tween T A and any M A of X that in the direct sum with the kernel of A 
constitutes the entire space X. It may therefore be assumed that 
every linear operator A generates an entire family of other linear 
operators. First, it is the zero operator defined on the kernel N A , 
i.e. the one from N A to 0. Second, it is a set of linear operators from 
the subspaces M A complementary to the kernel to the subspace T A . 
It is a very important fact that each of the new operators coincides 
on its domain with A. If N A = 0, then M A — X and the entire 
second set of operators coincides with A. If N A = X, however, 
then A is a zero operator. We shall continue this discussion later on. 
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Exercises 

Prove that the following operators are linear. 

1. A basis is given in a vector space X. An operator A assigns to each vector 
x 6 X its coordinate with a fixed index. 

2. A vector x 0 is fixed in a space X with a scalar product. An operator A 
assigns to each vector i{Xa scalar product (*, x 0 ). 

3. A vector x„ is fixed in a space K s An operator A assigns to each vector 
x 6 V t a vector product [x, x„\. 

4. A space X is formed by polynomials with real coefficients. An operator A 
assigns to each polynomial its fctn derivative. It is called an operator of k-fold 
differentiation. 

5. In a space of polynomials dependent on a variable t, an operator A assigns 
to each polynomial P ( t ) a polynomial t-P ( t). 

6. A space X is decomposed as a direct sum of subspaces S and T. Represent 
each vectors x 6 X as a sum x = u 4- v, where u 6 S and v 6 T. An operator A 
assigns to a vector x a vector u. It is called an operator of projection onto S paral¬ 
lel to T. 


57. The vector space of operators 

Fix two vector spaces, X and Y, over the 
same field P and consider the set oo xr of all linear operators from 
X to Y. In tojfy we can introduce the operations of operator addition 
and of multiplication of an operator by a number from P turning 
thereby <a XY into a vector space. 

Two operators, A and 8, from X to Y are said to be equal if 

Ax = Bx 

for each vector x 6 X. It is easy to verify that the equality relation 
of the operators is an equivalence relation. The equality of operators 
is designated 

A = B. 

An operator C is said to be the sum of operators A and B from 
X to Y if 

Cx = Ax + Bx 

for each x 6 X. A sum of operators is designated 

C = A + B. 

By definition it is possible to add any operators from X to Y. 
If A and B are linear operators of d) xr , then so is their sum. For 
any vectors u, v 6 X and any numbers a, p 6 P we have 

C (au + pv) = A (a u + pu) + B (a u + pu) = a Au 

+ P Av ctBu + p Bv = a {Au + Bu) + p (Av + Bv) 

— aCu + pt7i\ 
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The addition of operators is an algebraic operation. It is also asso¬ 
ciative. Indeed, let .4, B and C be three arbitrary linear operators 
of coxy- Then for any vector x 6 X 

({A + B) + C) x = (A + B) x + Cx = Ax + Bx + Cx 

= Ax + (Bx + Cx) = Ax + (B + C) x = (A + (B + 6')) x. 
But this means that 

(A + B) + C = A -j- (B + C). 

The addition of operators is a commutative operation. If A and B 
are any operators of a> Ay and x is any vector of X, then 

(A + B) x = Ax + Bx = Bx -f Ax = (B + A) x, 
i.e. 

A + B = B + A. 

Now it is easy to show that the set co A - y , with the operator addi¬ 
tion introduced in it, is an Abelian group. It has at least one zero 
element, for example a zero operator. Each element of (o Iy has at 
least one opposite element, for example an opposite operator. Every¬ 
thing else follows from Theorem 7.1. 

It follows from the same theorem that the addition of operators 
has an inverse. We shall call it subtraction and use the generally 
accepted notation and properties. 

An operator C is said to be the product of an operator A from X 
to Y by a number \ from a field P if 

Cx = \-Ax 

for each x 6 X. This product is designated 

C = \A. 

A product of a linear operator of o) Ay by a number is again a lin¬ 
ear operator of (Ojfy. Indeed, for any vectors u, v 6 X and any num¬ 
bers a, p 6 P we have 

C (au + pu) = kA (a u + pi>) = \ (a Au -f pAu) = a (kAu) 

+ p (L4i>) = aCu + p Cv. 

It is not hard to show that the addition of operators and the multi¬ 
plication of an operator by a number satisfy all properties that define 
a vector space. Hence the set o) xy of all linear operators from a vec¬ 
tor space X to a vector space Y forms a new vector space. It follows 
that from the point of view' of the operations of multiplication of 
an operator by a number and of addition and subtraction of opera¬ 
tors all the rules of equivalent transformations of operator algebraic 
expressions are valid. In what follows these rules will no longer be 
stated explicitly. 
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Note that nowhere did we use the relation between vector 
spaces X and Y. They may be both distinct and coincident. The set 
wjt.y of linear operators from a space X to the same space X will be 
one of the main objects of our studies. We shall call them linear 
operators in X. 


Exercises 

1. Prove that multiplying an operator by a nonzero 
number leaves its rank and nullity unaffected. 

2. Prove that the rank of a sum of operators is at most a sum of the ranks 
of the summands. 

3. Prove that a set of linear operators of cd^y whose ranges are in the same 
subspace is a linear subspace. 

4. Prove that a system of two nonzero operators of to whose ranges are 
distinct is linearly independent. 

5. Prove that a space of linear operators in Fj is one-dimensional. 


58. The ring of operators 

Consider three vector spaces X, Y and Z 
over the same field P. Let A be an operator from X to Y and 
let B be an operator from Y to Z. 

An operator C from X to Z is said to be a product of B by A if 

Cx — B (Ax) 

for each vector x 6 X. A product of B and A is designated 

C = BA. 

A product of linear operators is again a linear operator. For any 
vectors u, v £ X and any numbers a, P 6 P we have 

C (au + pi>) = B (A (au -r pv)) = B (a Au + p Av) 

= a B (Au) -f p B (Av) = a Cu + p Cv. 

The multiplication of operators is not an algebraic operation if 
only because product is not defined for any pair of operators. Never¬ 
theless if realizable operator multiplication possesses quite definite 
properties. Namely: 

(1) (AB) C — A (BC), 

(2) X (BA) = (KB) A = B (KA), 

(3) (A + B) C = AC + BC, 

(4) A (B + C) = AB + AC 


(58.1) 
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for any operators A, B and C and any number \ from P if of course 
the corresponding expressions are defined. 

The proofs of all these properties are similar, and therefore we 
shall restrict ourselves to the study of the first property. Let X, Y , 
Z and U be fixed vector spaces; and let A , B and C be any linear 
operators, where A is an operator from X to Y, B is operator from 
Y to Z, and C is an operator from Z to U. Observe first of all that in 
equation (1) both operators, (AB) C and A (BC ), are defined. For 
any vector x £ X we have 

({AB) C) x = AB (Cx) = A (B (Cx)), 

(A (BC)) x = A (BCx) = A (B (Cx)), 

which shows the validity of equation (1). 

Again consider the set (o xx °f linear operators in X. For any two 
operators of to** both a sum and a product are defined. According 
to properties (3) and (4) both operations are connected by the distrib¬ 
utive law. The set (o xx of linear operators is therefore a ring. It 
will be shown in what follows that a ring of operators is noncommu- 
tative. It may of course happen that for some particular pair of 
operators A and B the relation AB = BA does hold. Such operators 
will be called commutative. In particular, an identity operator is 
commutative with any operator. 

In a ring of linear operators, as in any other ring, a product of 
any operator by a zero operator is again a zero operator. The distrib¬ 
utive law relates to multiplication not only a sum of operators but 
also their diflerence. A ring of linear operators is at the same time 
a vector space and therefore for a difference of operators we have 

A - B = A + (-1) B. 

Property (2) of (58.1) describes the relation of multiplication of oper¬ 
ators in a ring to multiplication by a number. Remaining valid 
of course are all the relations following from the properties of vector 
spaces. 


Exercises 

1. In a space of polynomials in t, denote by D an oper¬ 
ator of differentiation and by T an operator of multiplication by t. Prove that 
DT ^ TD. Find the operator DT — TD. 

2. Fix some operator B of the space a> xx . Prove that the set of operators A 
such that BA = 0 is a subspace in a> xx . 

3. Prove that the rank of a product of operators is not greater than the rank 
of each of the factors. 

4. Prove that the nullity of a product of operators is not less than the nullity 
of each of the factors. 

5. Prove that in the ring a> xx of linear operators there are zero divisors. 
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59. The group of nonsingular operators 

Linear operators in a space X form an Abelian 
group relative to addition. But we can find among such operators 
sets that are groups relative to multiplication. These are connected 
with the so-called nonsingular operators. 

An operator in a vector space is said to be nonsingular if its kernel 
consists only of a zero vector. An operator that is not nonsingular is 
termed singular. 

Nonsingular operators are, for example, the identity operator and 
the scalar operator, provided it is not zero. Sometimes it is possible 
to associate with an operator A in X some nonsingular operator even 
in the case where A is singular. Indeed, let T A be the range of A 
and let N A be its kernel. If T A and N A have no nonzero vectors in 
common, then by (56.4) we have 

X = N a + T a . 

As already noted, an operator A generates many other operators 
from any subspace complementary to N A to the subspace of values 
T A . In the case considered A generates an operator from T A to T A . 
That operator is nonsingular since it sends to zero only the zero 
vector of T A . 

Nonsingular operators possess many remarkable properties. For 
such operators the nullity is zero and therefore it follows from (56.4) 
that the rank of a nonsingular operator equals the dimension of the 
space. If a nonsingular operator A is an operator in X, then its 
range T A coincides with X. Thus each vector of X is the image of 
some vector of X. This property of a nonsingnlar operator is equiva¬ 
lent to its definition. 

An important property of the nonsingular operator is the uniqueness 
of the inverse image for any vector of a space. Indeed, suppose that 
for some vector y there are two inverse images, u and v. This 
means that 

Au = (/, Av = y. 

But then 

A (u — v) = 0. 

By the definition of a nonsingular operator its kernel consists only 
of a zero vector. Therefore u — v = 0, i.e. u — v. This property is 
also equivalent to the definition of a nonsingular operator. Actually 
it has already been noted in Section 56. 

A product of any finite number of nonsingular operators is also 
a nonsingular operator. Obviously it suffices to prove this assertion 
for two operators. Let A and B be any nonsingular operators in the 
same space X. Consider the equation 

BAx = 0. 


(59.1) 
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According to the definition of operator multiplication this equation 
means that 

B (Ax) = 0. 

The operator B is nonsingular and therefore it follows from the last 
equation that Ax = 0. But A is also nonsingular and therefore 
i = 0. So only a zero vector satisfies (59.1), i.e. the operator BA 
is nonsingular. 

A sum of nonsingular operators is not necessarily a nonsingular 
operator. If A is a nonsingular operator, then so is the operator 
(—1) A. But the sum of these operators is a zero operator which 
is singular. 

Consider a set of nonsingular operators in the same vector space. 
On that set the multiplication of operators is an algebraic and asso¬ 
ciative operation. Among nonsingular operators is also the identity 
operator E which plays the part of unity. Indeed it is easy to verify 
that for any operator A in X 

AE = EA = A. 

If we show that for any nonsingtilar operator A there is a nonsingu¬ 
lar operator such that a product of it and A is an identity operator, 
then this will mean that the set of all nonsingular operators is 
a group relative to multiplication. 

Let A be a nonsingular operator. As we know, for each vector 
y 6 X there is one and only one vector x £ X connected with y by 
the relation 

y = Ax. (59.2) 

Consequently, it is possible to assign to each vector y £ X a unique 
vector x 6 X such that y is its image. The relation constructed is some 
operator. It is called the inverse of the operator A and designated A -1 . 
If (59.2) holds, then 

x = A~ l y. (59.3) 

We prove that an inverse operator is linear and nonsingular. 

A product is defined for any operators and not only for linear 
operators. Therefore it follows from the definition of an inverse 
operator that 

A~ l A = A A _1 = E. (59.4) 

To prove these equations it suffices to apply the operator A _1 to both 
sides of (59.2) and the operator A to both sides of (59.3). 

Take any vectors u, v 6 X and any numbers a, fl 6 P and con¬ 
sider the vector 

z = A -1 (a u — fiv) — a A -1 u — |L4 -1 u. 
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Now apply A to both sides of the equation. From the linearity of A 
and from (59.4) we conclude that Az = 0. Since A is nonsingular, 
this means that z = 0. Consequently, 

A -1 (a u 4- (5u) = aA _1 iz pA -1 i>, 

i.e. A~ l is linear. 

It is easy to show that A -1 is nonsingular. For any vector y from 
the kernel of A -1 we have 

A ~hy = 0. 

Apply A to both sides of this equation. Since A is a linear operator, 
AO = 0. Considering (59.4) we conclude that y = 0. So the kernel 
of A" 1 consists only of a zero vector, i.e. A -1 is a nonsingular operator. 

Thus the set of nonsingular operators is a group relative to multi¬ 
plication. It will be shown somewhat later that this group is non- 
commutative. 

Using nonsingular operators it is possible to construct commuta¬ 
tive groups too. Let A bean arbitrary operator in a space X. For any 
positive integer p we define the pth power of A by the equation 

p 

A p = A • A ... A, (59.5) 

where the right-hand side contains p multipliers. By virtue of the 
associativity of the operation of multiplication the operator A p is 
uniquely defined. Of course it is linear. 

For any positive integers p and r it follows from (59.5) that 

A p A r = A p+r . (59.6) 

If it is assumed by definition that 

A 0 — E 

for any operator A, then formula (59.6) will hold for any nonnegative 
integers p and r. 

Suppose A is a nonsingular operator. Then so is the operator A T 
for any nonnegative r. Hence there is an inverse operator for A r . 
By (7.2) and (59.5) we have 

r 

(A r ) -1 = (A~ i ) r = A-'A-'T'T'P 1 . (59.7) 


Also it is assumed by definition that 

A' r = (A0 -1 . 

Taking into account formulas (59.5) and (59.7) and the fact that 
AA -1 = A -1 A, it is not hard to prove the relation 

A p A~ r = A* r A p 
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for any nonnegative integers p and r. This means that formula (59.6) 
holds for any integers p and r. 

Now take a nonsingular operator A and make up a set of 
operators of the form A v for all integers p. On that set the multiplica¬ 
tion of operators is an algebraic and, as (59.6) implies, commutative 
operation. Every operator A p has an inverse equal to A~ p . Contained 
in co A is also an identity operator E. Hence cd a is a commutative 
group relative to multiplication. It is celled a cyclic group generated 
by the operator A. 


Exercises 

1. Prove that if for two linear operators A and B 
of we have AB = E, then both operators are nonsingular. 

2. Prove that for operators A and B of a> x x to be nonsingular it is necessary 
and sufficient that so should operators AB aha BA. 

3. Prove that if an operator A is nonsingular and a number a =j£ 0, then the 
operator aA is also nonsingular and (ai4) _l = (l/a)j4 _l . 

4. Prove that T A <r N A if and only if A 1 = 0. 

5. Prove that for any operator A 

N a s N a , = A r A , = ..., T a = T a , = T A , = . .. . 

6. Prove that an operator P is a projection operator if and only if P* = P. 
What are subspaces Np and Tpi 

7. Prove that if P is a projection operator, then E — P is also a projection 
operator. 

8. Prove that if an operator A satisfies A m = 0 for some positive integer m, 
then the operator aE — A is nonsingular for any number a =£ 0. 

9. Prove that a linear operator A for which E ojj4 + a t A* + . . . 
. . . + a n A n = 0 is nonsingular. 

10. Prove that if A is a nonsingular operator, then either all operators in 
the cyclic group (d a are distinct or some power of A coincides with tne identity 
operator. 


60. The matrix of an operator 

We discuss one general method of constructing 
a linear operator from an m-dimensional space X to an n-dimen- 
sional space Y. Suppose the vectors of a basis e lt e 2 , . . ., e m of X 
are assigned some vectors f 2 , f 2 , ■ • ., f m of Y. Then there is a unique 
linear operator A from X to Y which sends every vector e h to a cor¬ 
responding vector f h . 

Suppose that the desired operator A exists. Take a vector x 6 X 
and represent it as 

^ = “1“ ^2^2 • • • ”i“ \m£m' 


Ax = A(2 l k e k )= 2 2 Ikfk- 

b = i fe = l 


Then 
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The right-hand side of the relations is uniquely defined by z and 
the images of the basis. Therefore the equation obtained proves the 
uniqueness of the operator A if it exists. On the other hand, we can 
define the operator A by this equation, i.e. we can put 

Ax = 2 Ihfh • 

k=t 

It is easy to verify that the operator obtained is a linear operator 
from X to Y sending every vector e k to a corresponding vector f h . 
The range T A of A coincides with the span of the system of vectors 
/it fit • • •! fm• 

We can now draw an important conclusion: a linear operator A 
from a space X to a space Y is completely defined by the collection of 
images 

Ae i, Ae 2 , . . ., Ae m 

of any fixed basis 

^l» • • • » 

of X. 

Fix a basis e u e 2 , . . ., e m in X and a basis q lt q 2 , . . ., q„ in Y. 
A vector e x is sent by A to some vector Aex of Y which, as any vector 
of that space, can be expanded with respect to the basis vectors 

Ae y = a u q x + a 21 q 2 + . . . + a nl q n . 

Similarly 

Ae 2 — Ul2?l ®22?2 “H • • • “t* ®n2?n» 


Ae m — "f" ® 2 m ?2 • • • ~f" Anm?n» 


The coefficients a l} of these relations define an n x m matrix Aq e 



called the matrix of the operator A relative to the chosen bases. 

The columns of the matrix of the operator are the coordinates of 
the vectors Ae x , Ae 2 , . . ., Ae m relative to the basis q lt q 2 , . . ., q n . 
To determine an element a u of the matrix of A we should apply A 
to the vector e } and take the ith coordinate of the image Ae } . If 
by {z}j we denote for brevity the rth coordinate of a vector z, then 
a t j = {Ae } )i. This method of determining the elements of the matrix 
of an operator will be made use of in what follows. 
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Consider a vector x £ X and its image y = Ax. We show how the 
coordinates of the vector y can be expressed in terms of the coordi¬ 
nates of x and the elements of the matrix of the operator. Let 


Obviously 


m 

x =2 l tj e j> 


n 


y=2 t)i? i- 

i=l 


(60.1) 


in 

Ax = A{Y 

;=i 




n n m 

2 a ijQi =* 2 (2 tftu) 

i=i i=i }=i 


Comparing the right-hand side of these equations with expan* 
sion (60.1) for y we conclude that the equations 

m 

2 a uh =r U 

>=1 


must hold for i = 1, 2, , . n, i.e. that 

^11?1 *f" a i2^2 "1* • ♦ • *1* ®lm5m = ^li 

^21 -1 “H ®22^2 “t* • • • “t* a 2m?m = ^1 2» 

. . . (60.2) 

a nl&l “t" a n25* *t* • • • "4~ ^nm£m = ’'In* 

Thus, given fixed bases in X and Y, every linear operator generates 
relations (60.2) connecting the coordinates of image and inverse 
image. To determine the coordinates of the image from the coordi¬ 
nates of the inverse image it suffices to calculate the left-hand sides 
of (60.2). To determine the coordinates of the inverse image from the 
known coordinates of the vector y we have to solve the system of 
linear algebraic equations (60.2) for the unknowns Ej, E 2 , . . ., E m . 
The matrix of the system coincides with the matrix of the operator. 

Relations (60.2) establish a deep connection between linear opera¬ 
tors and systems of linear algebraic equations. In particular, it 
follows from (60.2) that the rank of an operator coincides with that 
of the matrix of the operator and that its kernel coincides with the 
number of fundamental solutions of the reduced homogeneous sys¬ 
tem. This fact trivially yields (56.4) and a number of other formulas. 

We shall fairly often turn to the connection between linear algebra¬ 
ic equations and linear operators. But we shall first prove that there 
is a 1-1 correspondence between operators and matrices, which prop¬ 
erly speaking determine systems of the form (60.2). We have already 
shown that every operator A determines some matrix A „ e given 
fixed bases. Now take an n X m matrix A qe . With bases in X and Y 
fixed, relations (60.2) assign to each vector ifX some vector y £Y. 
It is easy to verify that this correspondence is a linear operator. We 
construct the matrix of a given operator in the same bases. All the 
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coordinates of a vector e } are zero, except the ;th coordinate which 
is equal to unity. It follows from (60.2) that the coordinates of the 
vector Ae t coincide with the elements of the ;th column of A qe and 
therefore {AeAt = a l} . Hence the matrix of the constructed operator 
coincides with the original matrix A qe . 

So every n X m matrix is the matrix of some linear operator from 
an m-dimensional space X to an n-dimensional space Y, with bases 
in X and Y fixed. A 1-1 correspondence is thus established between linear 
operators and rectangular matrices given any fixed bases. Of course 
both the vector spaces and the matrices are considered over the 
same field P. 

Consider some examples. Let 0 be a zero operator. We have 
{0 ej) t = (0), = 0. 

Hence all elements of the matrix of a zero operator are zero. Such 
a matrix is called a zero matrix and designated 0. 

Now take an identity operator E. For this operator we find 

( 1 if i = 

oh.*,. 

Therefore the matrix of an identity operator has the following form. 
It is a square matrix whose principal diagonal has unities and whose 
other elements are zeros. The matrix of an identity operator is 
called an identity or unit matrix and denoted by E. 

We shall fairly often deal with yet another type of matrix. Let 
^i> h 2 , . . ., be arbitrary numbers from a field P. We construct 
a square matrix A, with the given numbers along the principal 
diagonal and all other elements zero, i.e. 



Matrices of this form are called diagonal. If all diagonal elements are 
equal, then the matrix is said to be a scalar matrix. In particular, 
an identity matrix is a scalar matrix. Rectangular matrices construct¬ 
ed in a similar way will also be called diagonal. If we consider 
relations (60.2), we shall easily see what the action of a linear opera¬ 
tor with a matrix A is. This operator “stretches” the ith coordinate 
of any vector by a factor of 'k l for every i. 



61] 


Operations on matrices 


191 


Exercises 

1. In a space of polynomials of degree not higher than n 
a basis 1, t, t*, . . t n is fixed. Of what form is the matrix of the operator 
of differentiation in that basis? 

2. In a space X an operator P of projection onto a subspace S parallel to 
a subspace T is given. Fix in X any basis made up as a union of the bases of S 
and T. Of what form are the matrices of P and E — P in that basis? 

3. Let A be a linear operator from X to Y. Denote by M * a subspace in X 
complementary to the kernel N. and by R A a subspace in r complementary 
to T a . How will the matrix of ^change if in choosing bases in X and Y we use 
bases of some or all of the indicated subspaces? 


61. Operations on matrices 

As we have shown, given fixed bases in spaces 
every linear operator is uniquely defined by its matrix. Therefore 
the operations on operators discussed earlier lead to quite definite 
operations on matrices. In the questions of interest to us now the 
choice of basis plays no part and therefore operators and their ma¬ 
trices will be denoted by the same letters with no indices relating 
to bases. 

Let two equal operators from an m-dimensional space X to an 
n-dimensional space Y be given. Since equal operators exhibit same¬ 
ness in all situations, they will have the same matrix. This justifies 
the following definition. 

Matrices A and B of the same size n x m with elements a tJ and 
b t j are said to be equal if 


a U =* b l} 

for i = 1, 2, •. n and j = 1, 2, . . ., m. The equality of matrices 
is designated 

A = B. 

Suppose now that A and B are two operators from X to Y. Consid¬ 
er an operator C = A + B. Denote the elements of the matrices 
of the operators respectively by c l}l a,j and bij. According to the 
foregoing c t j = {Cej}i. Considering the definition of a sum of opera¬ 
tors and the properties of the coordinates of vectors relative to the 
operations on them we get 

c f i 30 {Ce;}| ** {(A + B) ej} t = { Ae } + Bej} t 

= {A e j}t + {Befit = a l} + b t j. 

Therefore: 

A sum ofj two matrices A and B of the same size n X m with 
elements a t j and b t j is a matrix C of the same size with elements c i} if 

c u = a u + ha 
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for i = 1, 2, . . n and / = 1, 2, . . m. A sum of matrices is 
designated 

C = A + B. 

A difference of two matrices A and B of the same size n X m 
with elements a i} and b tJ is a matrix C of the same size with ele¬ 
ments C,j 

C U — a ij — b i} 

for i = 1, 2, . . n and / = 1,2.rn.A difference of matrices is 

designated 

C — A — B. 

Consider an operator A from X to Y and an operator C = \A for 
some number If a t j and c lt are elements of the matrices of the op¬ 
erators, then 

C U ~ {C e j)i ~ {kAej}t = \ {Aej) t = \a it , 

and we arrive at the following definition: 

A product of an n X m matrix A with elements a lt by a number \ 
is an n X m matrix C with elements c t j if 

on = hi tJ 

for i = 1, 2, . . n and j — 1, 2, . . m. A product of a matrix by 
a number is designated 

C = \A. 

Let an m-dimensional space X and an rc-dimensional space Y be 
given over the same field P. As proved earlier, given fixed bases in X 
and Y there is a 1-1 correspondence between the set (o A - r of all opera¬ 
tors from X to Y and the set of all n X m matrices with elements 
from P. Since operations on matrices were introduced in accor¬ 
dance with operations on operators, the set of n x m matrices, just 
as the set co Ay , is a vector space. 

It is easy to show one of the bases of the space of matrices. It is, 
for example, a system of matrices A (ftP) for k = 1, 2, .... n and 
p = 1, 2, . . ., m, where the elements a$ p) of a matrix y4 (fcp) are 
defined by the following equations: 

1 if i = k ' '=*’ 
u 1 0 otherwise. 

In the space o) Ay a basis is a system of operators with matricesi4 (fcp) . 
From this we conclude that the vector space of operators from X to Y 
is a finite dimensional space and that its dimension is equal to mn. 

Let X, Y and Z be vector spaces, let A be an operator from X to Y 
and let B be an operator from Y to Z. Also let m, n and p be the 
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dimensions of X , Y and Z respectively. Assume that bases e lt . . . 

■ • •> fm> 9i, ■ ■ Qn rj, . . r p are fixed in X, Y and Z. The 
operator A has an n X m matrix with elements a tJ , with 

n 

Ae t = 2 O'tjQf 

s— 1 

The operator B has a p X n matrix with elements b t j , with 

p 

^9a 2 

h= 1 

Investigating the matrix of the operator C = BA we conclude that 
it must be p x m and that its elements are as follows: 

n 

Ctj = {Ce ] ) l = {BAe J ) l = {B('2 a a ,q a )\ t 

1 

= 12 a s yfi<7*/.= i.2 “,j 2 { 2 (2 2 b„a aJ . 

s= 1 3=1 h= 1 h=l 3=1 3=1 

This formula suggests the following definition: 

A product of a p X n matrix B with elements b,j and an n X m 
matrix A with elements a u is a p X m matrix C with elements c tJ if 

Cij = 2 b la a s] (61.1) 

«=1 

for i — 1, 2, . . ., p and / = 1, 2, . . ., m. A product of matrices is 
designated 

C = BA. 

Thus a product is defined only for the matrices in which the num¬ 
ber of columns of the left factor is equal to the number of rows of 
the right factor. The element of the matrix of the product at the inter¬ 
section of the ith row and the /'th column is equal to the sum of the 
products of all the elements of the ith row of the left factor by the 
corresponding elements of the /th column of the right factor. 

We recall once again that there is a 1-1 correspondence between 
linear operators and matrices. Operations on matrices were intro¬ 
duced according to operations on operators. Therefore the operation 
of matrix multiplication is connected by relations (58.1) with the 
operations of matrix addition and of multiplication of a matrix by 
a number. 

We have already noted that a ring of operators and the group of 
all nonsingular operators in a vector space are noncommutative. To 
prove this it obviously suffices to find two square matrices A and B 
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such that AB BA. Take, for example, 



It is obvious that 

"-(!!)• "-(Is) 

and the noncommutativity of multiplication is proved. 

The operation of matrix multiplication provides a convenient way 
of writing relations of the type (60.2). Denote by i, an m x 1 ma¬ 
trix made up of the coordinates of a vector x and y<, an n x 1 matrix 
made up of the coordinates of a vector y. Then relations (60.2) are 
equivalent to a single matrix equation 

A qe Ze = y q * (61.2) 

which is called a coordinate equation corresponding to the operator 
equation 

Ax — y 

and relates in matrix form the coordinates of inverse image and 
image by means of the matrix of the operator. 

It is important to note that the coordinate and the operator equa¬ 
tion look quite alike from the notational point of view if of course 
the indices are dropped and the symbol Ax is understood as a prod¬ 
uct of A by x. Since the notation and the properties of operations 
on matrices and operators coincide, any transformation of an opera¬ 
tor equation leads to the same transformation of the coordinate 
equation. Therefore formally it makes no diSerence whether we deal 
with matrix equations or with operator equations. 

In what follows we shall actually draw no distinction between 
operator and coordinate equations. Moreover, all new notions and 
facts that hold for operators will as a rule be extended to matrices, unless 
otherwise noted. 


Exercises 

1. Prove that operations on matrices are related to the 
operation of transposition by 

(« aA)' = aA(A + B)' = A' + B\ 

(AB)' = B’A’, (A)' = A. 

2. Prove that every linear operator of rank r can be represented as a sum 
of r linear operators and cannot be represented as a sum of a smaller number of 
operators of rank 1. 

3. Prove that an n X m matrix has rank 1 if and only if it can be represented 
as a product of two nonzero, n X 1 and 1 X m, matrices. 
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4. Suppose that for fixed matrices A and B we have 

AC = BC 

for any matrix C. Prove that A = B. 

5. Find the general form of a square matrix commutative with a given diago¬ 
nal matrix. 

6. Prove that for a matrix to be a scalar matrix it is necessary and sufficient 
that it should be commutative with all square matrices. 

7. A sum of the diagonal elements of a matrix A is called the trace of A and 
designated tr A. Prove that 

tr A — tr A', tr ( aA) = a-trA, 

tr (A + B) = tr A + tr 5, tr (BA) = tr (AB). 

8. Prove that a real matrix A is zero if and only if tr (AA') = 0. 


62. Matrices and determinants 

Matrices play a very important part in the 
study of linear operators, with determinants not infrequently used 
as an auxiliary tool. We discuss now some questions connected with 
matrices and determinants. 

Let A be a nonsingular operator in a space X. Its rank coincides 
with the dimension of X. According to formulas (60.2) this means 
that the rank of the system of columns of the matrix of the operator 
coincides with the number of them. This is possible if and only if 
the determinant of the matrix of an operator is zero. So 

A n operator in a vector space is nonsingular if and only if the deter¬ 
minant of its matrix is nonzero. 

This property of a nonsingular operator justifies the following 
definitions: 

A square matrix is said to be nonsingular if its determinant is 
nonzero and singular otherwise. 

Of course, relying on the corresponding properties of nonsingular 
operators we can say that a product of nonsingular matrices is again 
a nonsingular matrix, that all nonsingular matrices form a group 
relative to multiplication, that every nonsingular matrix generates 
a cyclic group and so on. Their connection with nonsingular opera¬ 
tors allows us to say that every nonsingular matrix A has a unique 
matrix A' 1 such that 


A~ l A = A A- 1 = E. (62.1) 

The matrix A * l is called the inverse of the matrix A . 

Using the concept of determinant it is possible to express the 
explicit form of the elements of A _1 in terms of the minors of the 
matrix A. Formulas (40.5) to (40.9) provide a basis for this. Taking 
into account formula (61.1) for an element of a product of two ma- 
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trices we conclude that equations (62.1) are satisfied by the matrix 

( jin A 2] A m] 


A-> = 


d 

d 

• * 

d 

A 12 

A 21 


■^m2 

d 

d 


d 

m 

2 m 


Amm 


d d • • • d 

Here d is the determinant of a matrix A\ A a is the cofactor of a it , 
an element of A. By virtue of the uniqueness of the inverse matrix 
it is the only form it can have. 

We introduce abbreviated notation for the minors of an arbitrary 
matrix A. A minor on the rows i„ i 2 , . . i p and columns ; x , j 2 , . . . 
. . ., j p is designated 


^ / *i> *2> • • •» ip\ 

\7n 72. • • •• 7p/ 


In addition, it is assumed that if any indices in the upper (lower) row 
in the notation of a minor coincide, then this means that so do the 
corresponding rows (columns) of the minor itself. 

Theorem 62.1 (Cauchy-Blnet formula). Let a square n X n matrix C 
be equal to the product of two rectangular matrices A and B, of size 
n X m and m X n respectively, with m^ n. Then 


,/l 2 ... n\ 
\i 2 ... n) 


V a ( 1 2 ” W*‘ k2 kn \ 

? _ U *2 ... K) VI 2 ... n) 


(62.2) 


1 <fci<fc2<. . .<* n <m 


Proof. Denote by a ih b l} and c tJ elements of A, B and C. Ac¬ 
cording to the definition of a matrix product we have 

m 

c iy = S a lsb s j- 
1 

Substituting for the elements of C their expressions and using the 
linear property of the determinant for column vectors we find 


det 


( c n c ln \ 

\C n 1 ... c nn ) 


det 


2 a i sfisyi 

2 

rtb, 2 2 • • ■ 

2 a * s n ^n n 

<1=1 

S2= 1 


»n = 1 

m 

m 


m 

v n=i 

2 

«2=1 


2 a ns n ^sn n 
«.=1 
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m 


S2— 1 

2 a lt n b tn n 
5 n=‘ 

2 det 

»i=i 


m 

*2=1 

m 

... 2 a ^s„bs n n 

s n~* 


m 


m 

m 

Gis x b 3l t 

^lS2^S22 • • • 

2 a ls„bs„n 
s n= l 

_ V 

— ZJ 
*i=i 

2 det 
*2= i 


0n» 2 ^*22 • • • 

m 

2 a ns,fis„n 
a =, 


m 

m 


m j 

f a l 8^5,1 

^152^*22 • • 

• a \s n bs„n 

.= 2 

V 

ZJ 

... 

2 det 1 

.... 



Sl = l 

m m 

*2=1 

m 

s n =i 1 

/I 2 

\dnsibsi\ 

... n ' 

^n*2^S22 • • 

V 

• a ns n b, n n 

2 2 
*1=1 S2=l 


V 

ZJ 

'n =l 

A ( 

1 \ S 1 s 2 

■■■ Sn. 

1 b Sli b S2Z . 

.. b Sji „. 


(62.3) 


Each of the indices s lt s 2 , . . $„ is independent of the others and 

may take on any values from 1 to m and so the expression obtained 
is a sum of m n terms. In that sum those terms are zero at least two of 
whose indices are the same, since so are the corresponding minors of 
the matrix A. All the other terms can be divided into groups of n\ 
terms each, regarding as a group all terms the values of whose indices 
form the same collection of numbers. 

Denote by k u k 2 , . . ., k„ the values of the indices s 2 , . . s„ 
arranged in increasing order. Let 

8 (®i* •••• ®n) = ( 1) ’ 

where N is the number of transpositions required for a permutation 
s lt s 2 , . . ., s„ to be transformed into k 2 , k 2 , . . ., k n . Then within 
one group of values of the indices s„ s 2 , . . ., s„ the sum of the corre¬ 
sponding terms in (62.3) will be equal to 


/I 2 ... n \ 

2j ® (*l> 5 2» •■•' S n )A^^ Jc h Jb Sl lb S2 2 ... b $n n 

/I 2 ... R \ _ 

— A J 2j 8 (Sj, s 2 , ..., s n ) b Sl \b S2 2 . .. b $n „ 


/I 2 .. . n \ /k 

= A (k, k 2 ... 

It is from this relation that (62.2) follows. 


k x k 2 ... k n 
2 . 


1 ") 
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Corollary. The determinant of the product of two square matrices 
equals the product of the determinants of the factors. 

In this case the sum in (62.2) will consist of a single term. Therefore 



or equivalently 

det C = det A -det B. 

Corollary. Let a square n X n matrix C equal the product of two 
rectangular , n X m and m X n, matrices A and B, with m < n. 
Then det C = 0. 

Indeed, add to A and Bn — m zero columns on the right and 
n — m zero rows below, and the matrices obtained become square 
n X n matrices with zero determinants. The product of those matrices 
is a matrix C. Therefore according to the first corollary det C — 0. 


Exercises 

1. Prove that (.4- 1 )' = (i4')-‘ for any nonsingular 

matrix A. 

2. Prove that det (4"') = (det 4) -1 for any nonsingular matrix A. 

3. Prove that det ( aA) = <x n -det4 for any square n X n matrix A. 

4. Prove that if AB — E for square matrices A and B, then A is nonsingular 
and B = A~ l . 

5. Write a formula of the type (62.2) for an arbitrary minor of the product 
of two matrices. 

6 . Prove that for any real matrix A all the principal minors of matrices 
A'A and A A' are nonnegative. 

7. Prove that the rank of a product of matrices is not greater than the rank 
of each of the factors. 

8. Prove that multiplying by a nonsingular matrix leaves the rank unaffected. 


63. Change of basis 

Given fixed bases in spaces the coordinate equa¬ 
tion allows a complete study of the action of the linear operator. It 
is obvious that the study is more efficient the simpler the form of the 
matrix of the operator is. In general the matrices of operators are 
dependent on the choice of bases and our immediate task is to clarify 
this dependence. 

Let e lt e 2 , . . ., e m and / x , f 2 , . . ., f m be two bases of the same 
m-dimensional space X. The vectors / lt f 2 , . . ., f m are uniquely 
defined by their expansions 

/1 = P\\ e \ T Pn e 2 -T ■ ■ ■ ~T Pml e m> 

f j = Pn^i “1" Pz2^2 • • • "i - 

. (6.31) 

fm ~ Pim^i "1” Pinfi ~~r • • • "f" Pmm^m 




r>3] 


Change of basis 


i99 


a matrix 


P = 


i *1, 

• • 

e m . The coefficients 

Pll 

P 12 

••• Plm N 


P 21 

P 22 

••• P2m 1 

1 9 

Pm 1 

Pm2 

• • • Pmm ^ 



called the coordinate transformation matrix for a change from a basis 
^i> • • •) to a basis /j, / 2 , . . fm- 

Take a vector x £ X and expand it with respect to the vectors of 
both bases. Let 

m m 

*=2 ii «<=2 

4=1 i=l 


By (63.1) we have 


2 li*i = 2 ili/i = 2 T], 2 = 2 

i=l i=l i=l )=l J=1 


(2 f\tPjt) e j — (JS T lyPi/)^i- 

t=i i=i ;=i 


Comparing the coefficients of e t on the left and the right of these 
relations we find 

m 

1 piW) ( 63 - 2 ) 

j=i 


for i = 1, 2, . . m. These formulas are called coordinate transforma¬ 
tion formulas. As before denote by x, and x f m x 1 matrices made 
up of the coordinates of the vector x in the corresponding bases. 
Formulas (63.2) show that 

x e = Px f . (63.3) 

A coordinate transformation matrix must be nonsingular , since 
otherwise there will be linear dependence among its columns and 
hence among the vectors f u / 2 , . . ., f m . Of course, any nonsingular 
matrix is the matrix of some coordinate transformation defined by 
equation (63.3). Multiplying (63.3) on the left by the matrix P~ x 
we get 

x t = P~ l x t . 

Now let e lt . . ., e m , f lt . . f m and r lt . . ., r m be three bases in 
a vector space X. A change from the first basis to the third can be 
eSected in two ways: either directly from the first to the third basis 
or from the first to the second and then from the second to the third. 
It is not hard to establish the connection between the corresponding 
coordinate transformation matrices. By (63.3) 

x e = Px f , x f = Rx n x e = Sx T . 
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From the first two relations it follows that 


x e = Pij = P ( Rx r ) = (PR) x r , 


wliich implies that 


S = PR. 


Thus, with coordinate transformations successively carried out, the 
matrix of the resulting transformation is equal to the product 
of the matrices of the constituent transformations. 

Again consider a linear operator A from X to Y. Choose two bases, 
e t , . . ., e m and f u . . ., / m , in X and two bases, q lt . . ., q„ and 
<j, . . ., t„, in Y. Corresponding to the same operator A in the first 
pair of bases is the coordinate equation 

y q = A, le x e (63.4) 

and in the second 

y, = A,/!/. (63.5) 


Accordingly we have two matrices A qr and A tf for the same oper¬ 
ator A. 

Denote by P a coordinate transformation matrix for a change 
from the basis e lt . . ., e m to the basis /,, . . ., f m and by Q a coordi¬ 
nate transformation matrix for a change from g lt . . ., q„ to f lt . . . 
. . ., t„. We have 

x e = Px h y q = Qy,. (63.6) 


Substituting these expressions for x e and y q in (63.4) we find 


which yields 


Qyi = A qr Px h 
y, = (Q-'A qe P) x,. 


Comparing this with (63.5) we conclude that 

A tf ~Q-'A qe P. 


(63.7) 


This is the desired relation connecting the matrices of the same 
operator in different bases. 


Exercises 

1. Prove that the rank of the matrix of an operator is 
not affected by a change to other bases. 

2. Prove that the determinant of the matrix of an operator in a vector space 
is independent of the choice of basis. 

3. What correspondence can be established between nonsingular operators 
in a space X and transformations of coordinates in the same space? 

4. Let us say that two bases of the same real space are oi the same sign if 
the determinant of their coordinate transformation matrix is positive. Prove 
that all bases can be divided into two classes of bases of the same sign. 

5. Let us say that one class of bases of the same sign is left-handed and the 
other is right-handed. Compare these classes with those described in Section 34. 
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64. Equivalent and similar matrices 

Corresponding to every linear operator A from 
a space X to a space Y is a set of its matrices defined by the possi¬ 
bility of choosing different bases in X and Y. The structure of that 
set is essentially different according as X coincides with Y or 
does not. 

Two rectangular matrices A and B of the same size are said to be 
equivalent if there are two nonsingular square matrices R and S 
such that 

B = RAS. 

It follows from (63.7) that two matrices corresponding to the same 
linear operator when different bases are chosen in X and Y are 
always equivalent. It is not hard to see that the converse is alsa 
true. That is, two equivalent matrices always correspond to the same 
linear operator in suitably chosen bases. Thus, corresponding to 
every linear operator mapping X into Y is a class of equivalent 
matrices. 

Theorem 64.1. For two rectangular matrices of the same size to he 
equivalent it is necessary and sufficient that they should have the same 
rank. 

Proof. Multiplying any matrix by nonsingular matrices leaves 
its rank unaffected and therefore equivalent matrices have the same 
rank. Now let two matrices of the same size have the same rank. We 
prove that they are equivalent. We prove even more, that is that 
every matrix of rank r is equivalent to a matrix 



10 ... 

0 0 . 

. 0 ] 

0 1 ... 

0 0 . 

. 0 

0 0 ... 

1 0 . 

. 0 

0 0 ... 

0 0 . 

. 0 

0 0 ... 

0 0 . 

. 0 


Let a rectangular n X m matrix be given. It defines some linear 
operator A mappings space X with a basis e It e 2 , . . ., e m into a space 
Y with a basis q t , q 2 , . . ., q„. Denote by r the number of linearly 
independent vectors among the images of the vectors of the basis 
Ae 1} Ae 2 , . . ., Ae m . We may assume without loss of generality that 
it is the vectors Ae lt Ae 2 , . . ., Ae r that are linearly independent, 
since this can be achieved by a proper numbering of the basis vectors. 
The remaining vectors Ae r+1 , . . ., Ae m can be linearly expressed in 
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terms of them, 

r 

Ae k = ^ c kJ Aej 

>=i 

(64.1) 

for k = r -r 1, . . 
follows 

., m. We define a new basis f lt / 2 , . . 

f m in X as 


( e^, k = 1, 2, ..., r, 


fk = 

) e h — V c kj e } , k=r + 1,-m. 

V }= 1 

(64.2) 

Then by (64.1) 

Af h = 0 

(64.3) 

lor k = r + 1, . . 

., m. Set, further, 



-fro 

II 

■»* 

■'5T 

(64.4) 


for; = 1, 2, . . r. Vectors f lt t 2 , . . t r are by assumption linearly 
independent. Supplement them with some vectors £ r+1 , . . t n to 
a basis in Y and consider the matrix of the operator A in the new 
bases f lt . . ., f m and t lt . . ., t„. The coefficients of the Ath column of 
•the matrix coincide with the coordinates of the vector Af h in the 
basis £i, . . t„. According to (64.3) and (64.4) the matrix of A will 

coincide with I r . 

The original matrix and I T correspond to the same operator and 
therefore they are equivalent. Hence all matrices of the same rank 
are equivalent to I T and therefore to one another. 

While proving the theorem we answered a very important question: 
How are bases in spaces X and Y to be chosen for the matrix of the 
linear operator to have the simplest form? Besides we have shown an 
explicit form of that simplest matrix. 

So simple and effective an answer has turned out to be possible 
because bases in X and Y could be chosen independently of each other. 
Now let A be an operator in a space X. Of course, we could again 
consider images and inverse images in different bases, but it is not 
natural now since both images and inverse images are in the same 
space. Using diflerent bases would greatly hamper the study of the 
action of the operator on the vectors of the space X. If there is one 
basis, then the matrices P and Q in (63.6) coincide. Hence, corre¬ 
sponding to every linear operator in a vector space is a class of matri¬ 
ces connected by the relations 

B = P-'AP (64.5) 

for different nonsingular matrices P. Such matrices are called similar 
and a matrix P is called a similarity transformation matrix. 
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The question as to when two matrices can be similar is rather 
complicated and we shall get the answer much later. As complicated 
is the question of what the form of the simplest of all similar matrices 
is like. Devoted to these studies are the next two chapters. 

Exercises 

1. Prove that the equivalence and similarity criteria 
of matrices are equivalence relations. 

2. Prove that similar matrices have the same trace and the same determi¬ 
nant. 

3. Prove that under the same similarity transformation a cyclic group of 
nonsingular matrices goes over into a cyclic group. 

4. Prove that under the same similarity transformation a linear subspace of 
matrices goes over into a linear subspace. 

5. On a set of square matrices of the same size consider an operator of simi¬ 
larity transformation of those matrices using a fixed similarity transformation 
matrix. Prove that that operator is linear. 

6. Prove that the set of all similarity transformation operators over the 
same set of square matrices of the same size forms a group relative to multipli¬ 
cation. 



CHAPTER 8 


The Characteristic Polynomial 


65. Eigenvalues and eigenvectors 

Let A be a linear operator in a space X. This 
means that each vector x 6 X is assigned some vector y = Ax of 
the same space X. It may turn out that for some nonzero vector x 
its image and inverse image are coll inear. As we shall see in what 
follows such a situation substantially simplifies the study of the 
operator. 

A number X is said to be an eigenvalue and a nonzero vector x is 
said to be an eigenvector of a linear operator A if they are connected 
by the relation Ax = Xx. 

Notice that if x is an eigenvector corresponding to an eigenvalue X y 
then any collinear vector ax for a # 0 will be also an eigenvector. 
If there are two eigenvectors, x and y, corresponding to an eigenval¬ 
ue X, then any nonzero vector of the form ax -f Pi/ will be an eigen¬ 
vector. By definition a zero vector is not an eigenvector. Therefore 
the set of all eigenvectors which are linear combinations of any 
number of given eigenvectors corresponding to the same eigenvalue X 
is not a subspace. If, however, we extend X * by joining a zero vec¬ 
tor, then will become a subspace. We call it a proper subspace 
of A corresponding to the eigenvalue X. 

It is not hard to understand that eigenvectors of the operators 0, E 
and a E are all the nonzero vectors of X. These operators have each 
only one eigenvalue, 0, 1 and a, respectively, and hence at least 
one proper subspace coinciding with the entire space X. The pro¬ 
jection operator P has two collections of eigenvectors, all vectors in 
the range of P and all vectors in the range of E — P. To the first 
collection there corresponds the eigenvalue X = 1 and to the second 
there corresponds —X = 0. Indeed, since P 2 = P, we have 

P ( Px ) = P 2 x = Px = 1 • Px, 

P ((E - P) x) = (P - P*) x = (P - P) x = 0 = O-(E-P) x. 

Consequently, a projection operator has at least two proper sub¬ 
spaces. 

Theorem 65.1. A system of eigenvectors x 2 , x 2 ,. . ., x m of an operator A 
corresponding to mutually distinct eigenvalues X lf X 2 , . . ., is linearly 
independent. 
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Proof. Eigenvectors are nonzero by definition and therefore the 
theorem is clearly true for m = 1. Let it be true for any system 
of m — 1 eigenvectors but false for vectors x,, x 2 , . . x m . Then the 
system of those vectors will be linearly dependent, i.e. for some 
numbers a,, a 2 , . . a m not all zero 

ctjX! + a 2 x 2 + a m x m = 0. (65.1) 

Suppose that a x =# 0. Applying A to (65.1) we get 

GCjXiX| -}- a 2 X 2 x 2 <t m X m x m = 0. (65.2) 

On multiplying (65.1) by X m and subtracting it from (65.2) we find 
«1 (^1 — K) *1 + (*2 — ^m) *2 + 

CC m -1 Q'-m -1 ^m) ^m-l ~ 

By induction hypothesis it follows that all coefficients of x lt x 2 , . . . 
. . ., x m _! are zero. In particular, a! (A. x — X m ) = 0, which contra¬ 
dicts the hypothesis that # A. m and the assumption that a 2 # 0. 
Hence the system of vectors x 1 , x 2 , . . ., x m is linearly independent. 

Corollary. No linear operator in an m-dimensional space can have 
more than m mutually distinct eigenvalues. 

Of particular interest is the case when in an m-dimensional space 
the operator A has m mutually distinct eigenvalues. Then by Theo¬ 
rem 65.1 we can choose a basis of the space consisting entirely of 
eigenvectors of A. 

A linear operator A in an m-dimensional space X is said to be an 
operator of a simple structure if it has m linearly independent eigen¬ 
vectors. 

The fact that of all linear operators we single out operators of 
a simple structure is very simply explained. These and only these 
operators have diagonal matrices in some basis. Indeed, let x lt x 2 , . . . 
. . ., x m be linearly independent eigenvectors of an operator A. 
Take them as basis vectors of a space X and construct the matrix 
of A in that basis. We have 


Ax 2 — XjXj, 
Ax 2 = X 2 x 2 , 


A x to — X m x m . 

We recall that column elements of the matrix of an operator coincide 
with the coordinates of the images of basis vectors. Therefore the 
matrix A*, of the operator A has the following form in a basis con- 
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sisting of eigenvectors: 



If now A has in some basis x x , x 2 , . . ., x m a diagonal matrix with 
some, not necessarily different, numbers Xj, X 2 , . . X m on the 
principal diagonal, then x x , x 2 , . . x m are eigenvectors of A corre¬ 
sponding to eigenvalues Xj, X 2 , . . ., X m . 

Thus operators of a simple structure, and operators of a simple 
structure alone, have diagonal matrices in some basis. That basis 
can be made up only of eigenvectors of the operator A. The action 
of any operator of a simple structure always reduces to a “stretching” 
of the coordinates of a vector in the given basis. If all linear operators 
had a simple structure, then the question of choosing a basis in which 
the matrix of an operator has the simplest form would have been 
completely solved. However, operators of a simple structure do not 
exhaust all linear operators. 

Exercises 

1. Let an operator A have an eigenvector x correspond¬ 
ing to an eigenvalue k. Prove that for the operator 

a 0 E + a x A + ■ • ■ + “no¬ 
where a 0 , a„ . . ., a„ are some numbers, the vector x is also an eigenvector but 
that it corresponds to an eigenvalue a 0 + a^k + . . . + a n k n . 

2. Prove that operators A and A — aE have the same eigenvectors for any 
operator A and any number a. 

3. Prove that an operator A is nonsingular if and only if it has no zero eigen¬ 
values. 

4. Prove that operators A and A _l have the same eigenvectors for any nonsin¬ 
gular operator A. What is the connection between the eigenvalues of the op¬ 
erators? 

5. Prove that if an operator A is of a simple structure, then the operator 

a 0 E + a^A + . . . + a. n A n 
is also of a simple structure. 

6. Prove that an operator of differentiation in a space of polynomials is not 
an operator of a simple structure. Find the eigenvectors and eigenvalues of 
that operator. 

7. Consider a similarity transformation operator with a diagonal matrix. 
Prove that that operator is of a simple structure. Find all its eigenvectors and 
eigenvalues. 


66. The characteristic polynomial 

Not any linear operator has at least one eigen¬ 
vector. Suppose, for example, that we have an operator in a 
space V 2 which turns every directed line segment about the origin 
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counterclockwise through an angle of 90°. It is obvious that in that 
case image and inverse image will never be collinear and the opera¬ 
tor will have no eigenvector. To study the question of the existence 
of eigenvectors we first introduce an equation satisfied by all eigen¬ 
values of a linear operator. 

Let A be a linear operator in an m-dimensional space X over 
a field P. If A has an eigenvalue X corresponding to an eigenvector x, 
then by definition Ax = Xx or equivalently 

(XE - A) x = 0. (66.1) 

Since x is nonzero, from (66.1) it follows that the operator XE — A 
is singular. Thus the eigenvalues of A are only those numbers X 
from P for which XE — A is singular. 

Fix in X some basis e lt e a , . . e m and denote by A e the matrix 
of A in that basis. The operator XE — A is singular if and only if 
so is its matrix XE — A e , i.e. if 

det (XE — A,) = 0. (66.2) 

Determining eigenvalues was independent of the choice of basis 
in X. Therefore the numbers X from P satisfying (66.2) must not 
depend on the choice of basis either. It is in fact the left-hand side 
of (66.2) that is independent of the choice of basis for any X, although 
formally this dependence is noted. Suppose in some other basis 
/ 2 , ■ • fm the operator A has a matrix A,. According to (64.5) A e 
and A, are related by 

A, = Q-'A t Q 

for some nonsingular matrix Q. Now r for any X from P we find 
det (XE — A/) = det (XQ~ l EQ — Q~ l A,Q) = det ( Q~ 1 (XE — A e ) Q) 

= det det (XE — A e ) det Q = (det <?)‘‘ det (XE — A e ) del Q 

= det (XE — A e ). 

Taking into account the expression for the determinant of the 
matrix in terms of its elements it is easy to see that the left-hand 
side of (66.2) can be represented as follows: 

det (XE — A # ) = flj -(- fl|X. 1 -f- a m X m . (66.3) 

The coefficients a 0 , . . ., a m are calculated in some way from the ele¬ 
ments of A e and are independent of X. The maximum power of X 
enters only into the product of the diagonal elements of XE — A e 
and therefore 

dm = L 

We show explicit expressions of two more coefficients. Namely 
a 0 =(— l) m det A e , <z m _,= —trA e . 
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In general, it may be assumed that expanding the determinant 
det (XE — A e ) in powers of X in different ways we should obtain 
expressions of the type of the right-hand side of (66.3) but with 
different coefficients a t . It will be shown later on, however, that 
this assumption is not valid. The coefficients on the right of (66.3) 
are independent of the way they are calculated. Considering the 
independence of the determinant det (XE — A e ) from the basis we 
conclude that all coefficients a 0 , . . ., are in fact characteristics 
of the operator A. The function 


/ W — a o + a iX + •. • + (66.4) 

is called the characteristic polynomial of the operator A. 

Associated with every linear operator is a characteristic polynomial. 
The converse is also true. Every polynomial of the form (66.4) is 
a characteristic polynomial of some linear operator. This may be, 
for example, an operator whose matrix A e in some basis has the 
following form: 



~ a m- 1 

a m- 2 • •• 

— a, 

— <*o 


1 

0 

0 

0 


0 

1 

0 

0 


o' 

0 

1 

0 


This is easy to see from a direct check, using the Laplace theorem 
for calculating det (XE — A e ). A matrix of the form (66.5) is called 
a Frobenius matrix. 

For a number X from P to be an eigenvalue of an operator A it is 
necessary and sufficient that it should satisfy the equation 

a 0'"f _a l^ , 4" • • • +* m = 0, 

i.e. that it should be a root of the characteristic polynomial. Not 
in every field P any polynomial with coefficients from P has at 
least one root from P. As an illustration, A. 2 + 1 has no roots either 
in the field of rationals or in the field of reals. 

A field P is said to be algebraically closed if any polynomial with 
coefficients from P has at least one root from P. 

Thus if a linear operator acts in a space over an algebraically 
closed field, it must have at least one eigenvector. It is possible to 
construct various algebraically closed fields, but it is only one of 
them, the field of complex numbers, that is of the greatest practical 
value. To prove the algebraic closure of this field is the aim of our 
immediate studies. 
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Exercises 

1. Find the characteristic polynomial for the zero and 

the identity operator. 

2. Find the characteristic polynomial for the operator of differentiation. 

3. Is the coincidence of characteristic polynomials an indication of the 
equality of the operators? 

4. Prove that operators with matrices A and A' have the same character¬ 
istic polynomials. 

5. Suppose that in some basis an operator has matrix (66.5). Find the coor¬ 
dinates of the eigenvectors in the same basis. 

6. Prove that an operator with matrix (66.5) has a simple structure if and 
only if the characteristic polynomial has m mutually distinct roots. 

67. The polynomial ring 

In some exercises and examples we have already 
noted the algebraic properties of polynomials. In connection with 
the investigation of the characteristic polynomial we shall continue 
those studies. 

Let P be an arbitrary field. Consider a set of polynomials, i.e. 
functions of the form 

f(z) = a 0 + a l z +...+a n z n ( 67 . 1 ) 

dependent on an independent variable z assuming values from P and 
having coefficients a 0 , . . a n from P. A polynomial / (z) is said to 
be a polynomial of degree n if a n 0 and all coefficients with larger 
indices are zero. The only polynomial without a definite degree is 
the one all of whose coefficients are zero. We shall call it a zero 
polynomial and designate 0 . 

Two polynomials are said to be equal if so are all their coefficients 
of equal powers of the independent variable. 

Now let / (z) and g (z) be polynomials of degree n and s respective¬ 
ly. Also let 

/ (z) = a 0 + fl.z + ... + fln-iz" -1 + a„z", 

g(z)=--b 0 + b l z+...+b l _ { z’- l + b„z a *' 

and suppose for definiteness that s. A sum f (z) + g (z) of f (z) 
and g (z) is a polynomial 

/ («) + g ( 2 ) = c 0 + CiZ + • • • + C n -iZ n ~ l +C n Z n , 

where c t — a, + b t for s and c t = a t for i > $. The degree of 
the sum of the polynomials is n if n > s, but for n = s it is less 
than n if b n = — a n . 

A product f ( z)-g (z) of f (z) and g (z) is a polynomial 

f(z)-g(z) = d 0 + d l z+...+ d n+J _,z n+, -‘ d n+l z n+> , 
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where 

dt = 2 a hi>i 

for i = 0, 1, . . n + s. A coefficient di is a sum of the products of 
the coefficients of / (z) and g (z) the sum of whose indices is i. For 
example, 

d 0 = flo^Oi ^n + a = 

From the last equation it follows that d n+4 0 and therefore the 
degree of the product of nonzero polynomials is equal to the sum 
of the degrees of the factors. Hence a product of nonzero polynomials 
is a nonzero polynomial. 

A special case of a product of polynomials is a product af (z) of 
a polynomial f (z) by a number a, since a nonzero number can be 
regarded as a polynomial of degree 0. 

A set of polynomials with the operations introduced above is 
a commutative ring. We shall not concern ourselves with checking 
that all the axioms hold. 

Theorem 67.1. For any polynomial f (z) and nonzero polynomial 
g (z) we can find unique polynomials q (z) and r (z) such that 

f(z) = g (z) q (z) + r (z), (67.3) 

with the degree of r (z) lower than that of g (z) or r (z) = 0. 

Proof. Let / (z) and g (z) be polynomials of degrees n and s. If 
n < s or / (z) = 0, then it is possible to set q (z) = 0 and r (z) = 
= / (z) in (67.3). Suppose therefore that s. 

We represent / (z) and g (z) according to (67.2) and set 

/(z)-^-z"-g(z) = /,(z). (67.4) 

Let the degree of f x (z) be n j and let its leading coefficient be a”’. 
It is clear that n^ •< n. If n^ ^ s, then we set 

o (,) 

/.(*)- £-z”>-*g(z) = f 2 (z). (67.5) 

Denote by n t the degree of f 2 (z) and by a'£ its leading coefficient. 
If n t ^ s, then again we set 

a< 2) 

/*(*)-£-**-*(*)-/,(i) (67.6) 

and so on. 

The degrees of f 2 (z), f 2 (z)» ... are decreasing. Therefore in a finite 
number of steps we arrive at the following equation: 

4 h " 1) 

/ft-i(z)- h ^z nh -'-'g(z) = fk (Z), 


(67.7) 



67] 


The polynomial ring 


211 


where f h (z) is either zero or its degree n k is less than s. After that the 
process is stopped. 

Now adding all equations of the type (67.4) to (67.7) we get 


/(*)- 



z n - + 



«"!-* + 


a (h-l) \ 

f-^i- 2 n »-i- s Jg( 2 ) = / h (Z). 


This means that the polynomials 

a (D 

g(z) = -|2_z»-* + -£!- 2 " 1-+...4 


V 


6. 


2 n *-i- 


r (2) = fh (z) 


satisfy equation (67.3), with either r (z) = 0 or the degree of r ( 2 ) 
less than that of g (z). 

We now prove that the polynomials q ( z) and r (z) satisfying the 
hypothesis of the theorem are unique. Let there be other polynomials, 
q' (z) and r' ( 2 ), such that 

f(z) = g (*) q' (z) + r' (z), 


with either r' ( z) = 0 or the degree of r' ( z) less than that of g ( z). 
Then 

g ( z) (q (z) - q' (z)) = r' (z) - r (z). (67.8) 


The polynomial on the right of this equation is either zero or its 
degree is less than that of g (z). But the polynomial on the left of 
this equation has a degree not less than that of g (z) for q ( z) — 
q (z) 0. Therefore (67.8) is possible only if 

q (z) = q' (z), r (z) = r' (z). 


This completes the proof of the theorem. 

A polynomial q (z) is called the quotient of / (z) by g (z) and r (z) 
is the remainder. If the remainder is zero, then / (z) is said to be 
divisible by g (z) and g (z) is said to be a divisor of / (z). 

Consider division of a nonzero polynomial / (z) by a first-degree 
polynomial (z — a). We have 

/ ( 2 ) = (z - a) q(z) + r (z). (67.9) 

Since the degree of r (z) must be less than that of ( z — a), r (z) is 
a polynomial of degree zero, i.e. a constant. That constant is easjr 
to determine. On substituting on the right and left of (67.9) z = a 
we find that r (z) = / (a). So 

/ (z) = (z — a) q(z) + f (a). (67.10> 


For / (z) to be divisible by (z — a) it is necessary and sufficient 
that / (a) = 0. Numbers a such that / (a) = 0 are usually called 
roots of / (z). Thus finding all linear divisors of a polynomial is 
equivalent to finding all its roots. 
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Formula (67.10) allows the following conclusion to be drawn. For 
any a from P a polynomial / (z) of degree n can be uniquely repre¬ 
sented as an expansion in powers of (z — a): 

,f(*) = A 0 + A l (x — a)+... + A n _ i (z—a) n ~ l + A n (z — a) n , (67.11) 

where A 0 , . . ., A n are numbers from P. 

The existence of at least one expansion (67.11) can be established 
in a fairly simple way. On dividing / (z) by (z — a) we obtain a quo¬ 
tient q x (z) and a remainder A 0 related by 

(z) = (z — a) q l (z) + A 0 . (67.12) 

If q , (z) is of degree zero, then expansion (67.11) is obtained. If, 
however, the degree of q 1 (z) is othe- than zero, then on dividing 
q x (z) by (z — a) we have 

Qi (z) = (z — a) q t (z) + A x . (67.13) 

Combining (67.12) and (67.13) we find 

/ (z) = (z — ay q 2 (z) -Mj (x — a) + A 0 . 

We again divide q t (z) by (z — a), if necessary, and so on. Since the 
degrees of the quotients q 2 (z), q t (z), ... are successively decreasing, 
the process stops in n steps to yield expansion (67.11). 

Suppose now that an expansion of the same form has been obtained 
in some other way and has coefficients Ag, .... A' n . On letting 

q'i(z)=Ai + A' t +i(z—a)+ ... +A'„(z — a)" -1 

for 1 = 0, 1, . . ., n we conclude that 

q' l (z)=m(z-a)q' t+i (z) + A' l , (67.14) 

with q' 9 (z) = / (z) of course. Comparing (67.12) with (67.14), when 
i = 0, and considering the uniqueness of quotient and remainder 
we conclude that 4 0 = Ag and gj (z) = q{ (z). Similarly we can prove 
the equality of the other coefficients. 


Exercises 

1. Prove that "there are no zero divisors in a polyno¬ 
mial ring. 

2. Suppose that for some polynomials / (z) q> (*) = g (*) q> (*). Prove that if 
q> (*) =£ 0, then /(*)=? (*). 

3. Prove that nonzero polynomials / (t) and g (z) are divisible by each other 
if and only if g (t) = af\z) for a nonzero number a. 

4. Let each of the polynomials / a (*), ...,/*(*) be divisible by f (*). Prove 
that so is A (*) g % (*) + ...+ f h (*) g h (z), where g x (*),.. g h (*) are 
arbitrary polynomials. 

5. Prove that in expansions (67.1) and (67.11) for the same polynomial / (z) 
the coefficients a„ ana A n coincide. 
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68. The fundamental theorem of algebra 

We proceed to prove one of the most important 
statements, the theorem on the algebraic closure of the field of com¬ 
plex numbers. This theorem has application in various fields of 
mathematics. In particular, it underlies all further theory of linear 
operators. Following the established tradition we shall call it the 
fundamental theorem of algebra. 

So we must prove that any polynomial of degree n!> 1 with 
complex coefficients has at least one root, in general a complex root. 
We first consider polynomials of a special form. Namely, 

f (z) = a — z n . (68.1) 

Let us represent complex numbers z in the so-called trigonometric 
form 

z — r (cos cp + i sin cp). 

Here r is a nonnegative number called the absolute value or modulus 
of a number z, and cp is a real number called the argument of z. It 
is clear that for every number z its absolute value is uniquely defined. 
For nonzero numbers z their argument is defined up to a multiple 
of 2 ji; for z = 0 the argument is not defined. Composing the product 
of two complex numbers 

z = r (cos cp + i sin <p), v = p (cos i|> + i sin \Jj), 

we find 

zv = rp (cos cp + i sin 9) (cos + i sin \j>) = rp (cos (cp + >J>) 

+ i sin (cp + i|>)). 

From this we deduce that 

z" = r" (cos ncp + i sin ncp). 

This equation is called de Moivre's formula. It provides an easy 
way for finding the roots of equation (68.1). Indeed, let a complex 
number a be represented in trigonometric form 

a = a (cos 0 4- i sin 0). 

The equation 

a — z" = 0 

in z is equivalent to 

a (cos 0 4- i sin 0) = r" (cos ncp + i sin ncp) 
in r and cp. But the last equation clearly has the following solutions: 

, n/ _ 042*" 

r = + y a, cp=--- 
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for k = 0, 1, 2, . . n — 1. Hence the complex numbers 

a h = + P / alcos— -f-isin—-—1 (bo .2.) 

are the roots of (68.1). We shall call them the nth roots of a and 
designate them as 

a h = v // a - 

Now let / (z) be a polynomial with complex coefficients. We consid¬ 
er it to be a complex function of the complex independent vari¬ 
able z. For such functions, as for real-valued functions of a real inde¬ 
pendent variable, it is possible to introduce the concepts of conti¬ 
nuity, of derivative and so on. Not all of these notions will be equal¬ 
ly necessary to us, but they are all based on the use of the complete¬ 
ness of the space of complex numbers. 

A one-valued complex function / ( 2 ) of a complex independent 
variable 2 is said to be continuous at a point z 0 if for any arbitrarily 
small number e > 0 we can find 6 > 0 such that for any complex 
number 2 satisfying 

I 2 — «o I < 6 

we have 

| / ( 2 ) - / ( 2 0 ) | < e. 

A function / ( 2 ) continuous at each point of its domain is called 
everywhere continuous or simply continuous. 

Lemma 68.1. A polynomial / ( 2 ) with complex coefficients is a con¬ 
tinuous function of a complex independent variable z. 

Proof. Let 

/( 2 )--=a 0 + a 1 2 + ... +a n z n (68.3) 

and let z 0 be an arbitrary fixed complex number. Denote h = 2 — 2 0 . 
We show that for any arbitrarily small number e > 0 we can find 
6 > 0 such that | / ( 2 ) — / ( z 0 ) \ < e for | h | < 6. 

On expanding the polynomial / ( 2 ) in powers of (2 — 2 0 ) we get 

/ ( 2 ) = A 0 + Aj (2 — z 0 ) +- -(-An (2 — 2 0 ) n . 

Since A 0 = / ( 2 0 ) and (2 — z 0 ) is denoted by h, we have 

/ (2 0 + h)-f (2 0 ) =A t h+ ... + A n h n . (68.4) 

Hence 

l/(vH*)-/(*o)l <1411*1+ ... +| A„| |*|"=A(|fc I). (68.5) 

The real-valued function A (| * |) is a polynomial with real coeffi¬ 
cients | A ( | in a real variable | h |. As is known from mathematical 
analysis, A (| h |) is a continuous function everywhere and, in 
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particular, for | h \ = 0. Since A (0) = 0, given e>0we can find 
6 > 0 such that for 

| k I < 6 (68.6) 

we have 

A (| h |) < e. 

Taking into account inequality (68.5) we conclude that if ( 68 . 6 ) 
holds so does the inequality 

I / (zo + h ) ~ 1 (2o) I < e. 

Corollary. The absolute value of a polynomial is a continuous func¬ 
tion. 

This statement is immediate from the following relation: 

II / (2) I — I / (2o) IK I / (2) - / (2„) I- 

Corollary. If a sequence of complex numbers (z A ) converges to z 0 , 
then for any polynomial f (z) 

lim /(z*)=/(zo). 

h-*oo 

Lemma 68.2. If a polynomial f (z) of degree n^ 1 does not vanish 
for z = z 0 , then we can always find a complex number h such that 

I / ( 2 o + h) | < | / (z 0 ) |. 

Proof. Again consider expansion (68.4). Let A* be the first nonzero 
coefficient among A lt A 2 , . . ., A n . Take 

h=t V ( 68 - 7 > 

where we take as the Ath root any of its values and 

0 <t<l. ( 68 . 8 ) 

Let 

t 

Now from (68.4), taking into account (68.7) and ( 68 . 8 ), we find 
I / (*o + *) I -1 / ( 2 o) - t h f (z 0 ) -t- t h +iB h+i + ... + t n B n I 

^ | (1 _^)/(z 0 ) | -i-^i | 

•®>i+i I + • • • + f n |£„l 
=(1—t fc ) j/(z 0 ) |-+ t k * 1 1 B h+l | + ... +*"!;£„ I 
= I / ( z o) I + t h ( — I / (2o) | +1 | B h +, | + ... + t n ' k | B n |) 

= \f(z 0 )\ + t k B(l). 

Finally we have 

I / (2o + fe) K I / (2 0 ) I + t"B (t). 
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The function B ( t) is a polynomial with real coefficients and real 
independent variable t. It is a continuous function. But B (0) = 
= — I / ( z o) I < 0 therefore by virtue of the continuity of B ( t) 
there is t 0 within 0 <C 1 such that B ( t 0 ) is also negative. For 
a complex number h defined by a number t 0 according to (68.7) 
we get 

I / (*o + *) l< I / M \ + t%B (t 0 ) < | / (z 0 ) |. 

Lemma 68.3. For any polynomial / (z) of degree 1 and any 
infinitely large sequence (z h ) of complex numbers there is a limiting 
relation 

lim | f(z h ) |= +oo. (68.9) 

h-.ee 

Proof. Consider polynomial (68.3). For any z^Owe find 

I /(*) I > \a n II Z I" (l—I * P- ... 11 2 I' 1 )- ( 6810 ) 

Since {z*} is infinitely large, 

lim | z fe | = + 

h-*-oc 

The right-hand side of relation (68.10) is a real-valued function, 
and it is easy to see that 



But for the other factor of (68.10) we have 

lim | a n | |z h |" = + oo. 

Consequently, (68.9) is true. 

Theorem 68.1 (fundamental theorem of algebra). Any polynomial 
f (z) of degree n^ 1 with complex coefficients has at least one root, in 
general a complex root. 

Proof. Consider the set of all possible absolute values of / (z). 
Since |/ (z) 0, that set is bounded below. It is known from mathe¬ 

matical analysis that any bounded below nonempty set of real num¬ 
bers has a greatest lower bound. Let it be l for the set of values 
| / (z) | . This means that for every natural k we can find a complex 
number z h such that 

0< | / (z„) |-Z<2-\ 


lim I / (**) I = l- 

h-*-oo | 


It follows that 


( 68 . 11 ) 
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Assuming {z A } to be unbounded it would be possible to choose an 
infinitely large subsequence of it, and according to Lemma 68.5 
relation (68.11) could not hold. Therefore {z h } is bounded. Choose a 
subsequence {z* v } and let 

lim Zk=z 0 . 

A v -« v 

According to a corollary of Lemma 68.1 the absolute value of a 
polynomial is a continuous function. Hence 

| / (z 0 ) | = lim | / (z„ v ) | = l. 

ft v -eo 

If l # 0, then from Lemma 68.2 it follows that there is a number 
z 0 such that \f (z' a ) | < l. This contradicts the fact that l is the great¬ 
est lower bound of the absolute values of the polynomial and there¬ 
fore 1 = 0. 

So we have shown that there is a complex number z 0 such that 
I / (z 0 ) 1- 0 or equivalently 

/ (z„) = 0. 


This means that z„ is a root of / (z). 


Exercises 


1. Prove that the set of all nth roots of the complex 
number 1 forms a commutative group relative to multiplication. 

2. Prove that for a sequence of complex numbers {**} to be bounded it is 
necessary and sufficient that for at least one polynomial / (*) of degree n > 1 
the sequence {/ (*»)} should be bounded. 

3. Prove that tor any polynomial / (*) of degree n> 1 and for any complex 
number * 0 there is a complex number h such tnat I / (z 0 + h ) 1 > 1 / ( z o) 1- 

4. Prove that all roots of polynomial (68.3) are in the ring 


(1 -f max 

' h>0 


— ) _1 <|z|< (1 + max 

“0 ' 1 ' ^ A<n 



5. Try to “prove” the algebraic closure of a field of real numbers according 
to the same scheme as that For complex numbers. In what place has the “proof” 
no analogy? 


69. Consequences 

of the fundamental theorem 

There arise a variety of consequences from the 
fundamental theorem. Let us consider the most important of them. 

A polynomial / (z) of degree 1 with complex coefficients has 
at least one root z 1 . Therefore / (z) has a factorization 

/ (z) = (z — Zj) <p (z), 
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■where <p (z ) is a polynomial of degree n — 1. The coefficients of 
q> (z) are again complex numbers. Consequently, q> ( 2 ) has a root 
z 2 (if n ^ 2) and 

,<P { 2 ) = (2 — 2 2 )i{> ( 2 ) 
from which it follows that 

/ (2) = (2 — 2i) (2 - Z 2 ) l|> (2). 

Continuing the process we obtain a representation of the polynomial 
as a product of linear multipliers: 

/ ( 2 ) = b (2 — 2 t ) (2 — 2 2 ) ... (2 — 2 n ), 

where b is some number. Removing the parentheses at the right and 
-comparing the coefficients of the powers with the coefficients a t 
of / ( 2 ) we conclude that b = a„. 

There may be equal numbers among 2 lt 2 2 , . . ., z„. Suppose for sim¬ 
plicity that 2 lf . . ., 2 r are mutually distinct and that each of the 
numbers 2 r+1 , . . ., z„ is equal to one of the first numbers. Then 
/ ( 2 ) can be written as follows: 

/ ( 2 ) — a n (2 — 2 ,)*i (2 — z 2 ) fc * ... (z — 2 r ) fcr , (69.1) 

■where z ( =/= Zj for i j and 

*1 + ^2 + . . . 4- k r = n. 

Representation (69.1) is called a canonical factorization of a polyno¬ 
mial f ( 2 ). 

The canonical factorization is unique for a polynomial / ( 2 ) up 
to an arrangement of factors. Indeed, suppose that along with fac¬ 
torization (69.1) there is another canonical factorization 

/( 2 )=a „(2 — v,)' 1 (z-vj 1 » .. . (2 — v 

Then 

(2 — 2,)' , » (2 — Z-j)** ... (2 — z r ) h r=(z — u,)'l (2 — l> 2 )'* ... (2 — r m )' m . 

(69.2) 

Notice that the collection of numbers z 1 , . . ., z T must coincide 
■with the collection of numbers v lt . . ., v m . If, for example, z 2 is 
equal to none of the numbers v lt . . ., v m , then substituting 2 = 2 2 
in (69.2) we obtain zero at the left and a nonzero number at the right. 
-So if there are two canonical factorizations of / ( 2 ), then (69.2) may 
be only as follows: 

(2 — z,)*l(z — Zjs)** . . . (2 — z r ) h r=(z — 2,)h (2 — Z^'s . . . (Z — Z r ) lr . 

Suppose, for example, that and let > f 2 for definiteness. 

By dividing the right- and left-hand sides of the last equation by the 
same divisor (2 — 2 J)*! we get 

(2 2 1 )^ 1_ h (2 Zj)** ... (z 2 r )* r = (z Z 2 )** . 


.. (z-z T ) tr - 
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Substituting z = Zj we again see that there is zero at the left and a 
nonzero number at the right. The uniqueness of the canonical factor¬ 
ization is thus proved. 

If k t = 1 in the canonical factorization (69.1), then a root z t is 
said to be simple ; if k t > 1, then a root z ( is said to be multiple. 
A number k t is the multiplicity of a root z t . We can now draw a very 
important conclusion: 

Any polynomial of degree 1 with complex coefficients has n 
roots, each root counted according to its multiplicity. 

A polynomial of degree zero has no roots. The only polynomial that 
has arbitrarily many mutually distinct roots is the zero polynomial. 
These facts may be used to draw the following conclusion: 

If two polynomials f (z) and g (z) whose degrees do not exceed n have 
equal values for more than n distinct values of the independent variable , 
then all the corresponding coefficients of those polynomials are equal. 

Indeed, according to the assumption the polynomial f (z) — g (z) 
has more than n roots. But its degree does not exceed n and therefore 

/ (z) — g ( 2 ) = 0 . 

So a polynomial / (z) whose degree is not greater than n is complete¬ 
ly defined by its values for any n -f- 1 distinct values of the indepen¬ 
dent variable. This makes it possible to reconstruct the polynomial 
from its values. It is not hard to show an explicit form of this “recon¬ 
structing” polynomial. If for the values of the independent variable 
equal to a lt . . ., a n+1 a polynomial / (z) assumes the values 
/ (<*i). • • •. / ( a n+i)> then 

j f z ) = n y l / / a ) (Z —«l) ••• (2 — gj-l) g(-t-l) (« — g n +l) 

IK) '' (aj—a,)... (a,—a ( _,)(Oj—a (+1 ) ...(aj—ant ,) 1 

It is clear that the degree of the polynomial at the right does not ex¬ 
ceed n, and at the points z = a t it assumes the values / (a,). The 
polynomial thus constructed is called Lagrange's interpolation poly¬ 
nomial. 

Consider now a polynomial / (z) of degree n and let z lt z 2 , . . ., z„ 
be its roots repeated according to multiplicity. Then 
/if (2) — a„ (2 — Zj) (z — z„) . . . (z — z„). 

Multiplying the parentheses at the right, collecting similar terms and 
comparing the resulting coefficients with those in (68.3) we can de¬ 
rive the following equations: 

i/®n ~ (®i -(" 2j -f- ■ • • -f- Z n ), 

= -f-(ZjZ2 2jZ a -f- • . . “H ZjZ n -f- . . . 2 n _jZ n ), 
a n- 3 ! a n ~ (2i2 2 2 a -f- ZjZ^ Z n _ 2 Z n _jZ n ), 

.. 

= ( 1)” 1 (2i 2 2 • • • 2 n _j -f- Z 2 Z 3 . . . Z n ), 

a Ja n =( l) n Z 1 Z 1 . . . Z n . 
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These are called Vieta's formulas and express coefficients of the poly¬ 
nomial in terms of its roots. 

On the right of the kth equation is the sum of all possible products, 
k roots each, taken with a plus or a minus sign according as k is even 
or odd. 

For further discussion we shall need some consequences of the fun¬ 
damental theorem of algebra, relating to polynomials with real coef¬ 
ficients. Let a polynomial 

f(z)=a 0 + a,z-t- ... +a n z n 

with real coefficients have a complex [(but not reall) root v, i.e. 
aj + aiu-f- ... + a n i; n =0. 

The last equation is not violated if all numbers are replaced in it by 
complex conjugate ones. However, the coefficients a 0 , ..., a„ and the 
number 0, being reals, will remain unaffected by the replacement. 
Therefore 

a 0 + a,i;-t- ••• +a n i> n =0, 

i.e. / ( v) = 0. 

Thus if a complex (but not a reall) number v is a root of a polynomi¬ 
al / ( z) with real coefficients, then so is the complex conjugate number 
v. 

It follows that f ( z) will be [divisible by a quadratic trinomial 
<p (z) = (z — v) (z — v) — z 2 — (v + v) z + vv 

with real coefficients. Using this fact we prove that v and v have the 
same multiplicity. 

Let them have multiplicities k and l respectively and let k> l, 
for example. Then / (z) is divisible by the 1th degree of the polynomial 
<P ( 2 ), be. 

f (z)=q>\(z)-g(z). 

The polynomial q (z), as a quotient of two polynomials with real coef¬ 
ficients, has also real coefficients. By assumption it must have a num¬ 
ber v as its (k — f)-fold root and must have no root equal to v. Ac¬ 
cording to what was proved above it is impossible and therefore 
k = l. Thus all complex roots of any polynomial with real coefficients 
are mutually complex conjugate. From the uniqueness of the canoni¬ 
cal factorization we can draw the following conclusion: 

Any polynomial with real coefficients can be represented, up to an 
arrangement of the factors, uniquely as a product of its leading coef¬ 
ficient and polynomials with real coefficients. Those polynomials 
have leading coefficients equal to unity and are linear, if they corre¬ 
spond to real roots, and quadratic, if they correspond to a pair of com¬ 
plex conjugate roots. 
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Finally, we proceed to the most important conclusion, that for 
the sake of which, properly speaking, the fundamental theorem of al¬ 
gebra was proved. Let A be a linear operator in a complex space. The 
eigenvalues of that operator, and they alone, are the roots of the 
characteristic polynomial. By the fundamental theorem A has at 
least one eigenvalue X. Hence 

Any linear operator in a complex vector space has at least one eigen¬ 
vector. 

Notice that if A is an operator in a real or rational space, this con¬ 
clusion is no longer valid. 

In reference to eigenvalues we shall apply the same terminology 
as in reference to roots of a polynomial. In particular, an eigenvalue 
will be said to be simple, if it is a simple root of a characteristic poly¬ 
nomial, and multiple otherwise. The multiplicity of an eigenvalue 
X will be the multiplicity of X as a root of the characteristic polyno¬ 
mial. 


Exercises 

1. Prove that if a complex number a 0, then for 
any natural n then are only n distinct complex numbers whose nth power is 
equal to a. 

2. What is the relation between the roots of / (*) and / (* — a), where a is 
a complex number? 

3. Let a polynomial / (*) of degree not greater than n with complex coeffici¬ 
ents assume equal values for n-f-l distinct values of the independent variable. 
Prove that / (*) is a polynomial of degree zero. 

4. Prove that any polynomial of an odd degree with real coefficients has at 
least one real root. 

5. Prove that a polynomial / (*) has at least a root in each of the regions 



6. Prove that an operator A has a simple structure if and only if there are 
as many linearly independent eigenvectors corresponding to each of its eigen¬ 
values as Is the multiplicity of X. 
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The Structure 
of a Linear Operator 


70. Invariant snbspaces 

All the studies to come next will be carried 
on under the hypothesis that the linear operator is given in a complex 
space X. As already noted earlier, this assumption ensures that 
every linear operator has at least one eigenvector. 

A subspace L of a vector space X is said to be invariant under an 
operator A if for each vector x of L its image Ax is also in L. 

Any linear operator has at least two trivial invariant subspaces, 
the zero subspace and the entire space X. Of vital importance are 
only nontrivial invariant subspaces. Among them are, for example, 
proper subspaces. Since in a complex vector space any operator clear¬ 
ly has at least one eigenvector, any operator in such a space must have 
at least one nontrivial invariant subspace. 

It is easy to verify that for every operator A its domain T A and 
kernel N A are invariant subspaces. They are trivial if and only if 
A is nonsingular or zero. 

If L is an invariant subspace, then there may be many ways to 

construct a complementary subspace M such that X — L -f M. 
Among those complementary subspaces there may be no invariant 
subspace, however. But if there is at least one invariant complemen¬ 
tary subspace, then we may speak of decomposing the space into a 
direct sum of invariant subspaces. 

Knowledge of some invariant subspace and certainly of a decompo¬ 
sition of a space as a direct sum of invariant subspaces makes it pos¬ 
sible to construct a basis in which the matrix of the operator has a 
simpler form. Let an operator A have in an m-dimensional space X 
an invariant subspace L of dimension n. Choose a basis e lt e 2 , . .., e m 
in X so that its first n vectors are in L. Then the images Ae 1 , ... 

..., Ae n of the vectors e lt ..., e„ are in L and it is possible to expand 
them with respect to the vectors e lt ..., e„ as the basis vectors of L. 
Consequently, 

Aei = a ll e 1 4- a 2i e 2 + . . . + a nl e n . 


Ae„ — ®in®l ~t“ n&2 “f" • • • "t” ®nn^n* 

Recall that the column elements of the matrix of an operator coin¬ 
cide with the coordinates of the images of basis vectors. Therefore 
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the matrix A e of A in the basis e lt e 2 , ..., e m will be of the form. 


a ll ■ • • 

«ln 

a l. n+1 

• • • a lm 

a nt • • • 

n 

,n+1 

• •• ®nm 

0 ... 

0 

fl n+l. n+1 

• • • a n+l, m 

. 0 ... 

0 

a m, n+1 

• • • ®mm 


As a rule matrices of such a type would be written in the so-called 
block form. Namely, 


A.= 


( 


■^ti 

0 


A\2 \ 
A 22 ) 


(70.1) 


Here A n is a square n X n matrix, A 22 is a square m — n matrix, 
0 is a zero (m — n) X n matrix and A 12 is an n X (m — n) matrix. 

Suppose now that X is decomposed as a direct sum of invariant 
subspaces L and M. Choose a basis e lt e 2 , . . ., e m in X so that its first 
n vectors are in L and the remaining m — n vectors are in M. In 
this case the images Ae lt ..., Ae n can be expanded only with respect 
to the vectors e lf . . ., e n and the images Ae„ +l , . . Ae m only with 
respect to the vectors e n + 1 , . . ., e m . The matrix A lt in (70.1) 
will obviously be zero. Therefore the matrix A e of A in the basis un¬ 
der consideration will have a still simpler form. Namely, 



Suppose the action of the operator A is investigated only on the 
vectors of the invariant subspace L. If a: ^ L, then Ax £ L. Hence 
it may be assumed that A generates on L some other operator, A \ L, 
defined by the equation 

(A | L) x = Ax 


for every x 6 L. The operator A | L is called the induced operator 
generated by the operator A. In relation to A | L the operator A is 
called the generator. By virtue of the linearity of A the induced operator 
is also linear. It coincides with the generator A on L and is not de¬ 
fined outside L. Thus these operators mainly difler in their domains. 

However artificial its introduction may seem, the induced opera¬ 
tor provides a very convenient auxiliary tool in carrying out di¬ 
verse Studies. For example, the induced operator, as any other linear 
operator, has at least one eigenvector. But since its domain coin¬ 
cides with that of the generator, this means that 
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Any linear operator has at least one eigenvector in every invariant sub¬ 
space. 

If a space is decomposed as a direct sum of r invariant subspaces, 
then the linear operator has at least r linearly independent eigen¬ 
vectors. 

It is clear that any eigenvalue and any eigenvector of an induced 
operator are respectively the eigenvalue and the eigenvector of the 
generator. Less obvious is 

Theorem 70.1. The characteristic polynomial of an induced operator 
generated on a nontrivial subspace is a divisor of the characteristic poly¬ 
nomial of the generator. 

Proof. Let an induced operator A \ L be defined on an invariant 
subspace L. Again choose a basis e lt .... e m of a space X so that vec¬ 
tors Cj, . . ., e„ constitute a basis in L. If the matrix of the gener¬ 
ator is A t ol (70.1), then the matrix of A \L is A n of (70.1). The char¬ 
acteristic polynomial is equal to det (X£ — A e ) for A and to 
det (kE — A u ) for A \ L. Applying the Laplace theorem to expand 
the determinant det (kE — A e ) by the first n columns we find 

/ kE — An —A ,2 \ 
dot (\E —^.)=det ( „ )E _/J 

= det (kE— A,,) det ( kE—A 

This equation establishes the validity of the theorem. 

Determining all eigenvalues of the operator A reduces to finding 
all roots of the characteristic polynomial. If A has a nontrivial in¬ 
variant subspace, then by Theorem 70.1 this problem can be reduced to 
finding all roots of two polynomials of lower degree. If the induced 
operator itself has a nontrivial invariant subspace, then the process 
of factoring the characteristic polynomial can be continued. 


Exercises 

1. Prove that the sum and the intersection of invariant 
subspaces are invariant subspaces. 

2. Prove that if A is a nonsingular operator, then any induced operator is 
also nonsingular. 

3. Prove that if A is an operator of a simple structure, then any induced 
operator has also a simple structure. 

4. In what case is an invariant subspace of an operator of a simple structure 
a direct sum of proper subspaces? 

5. Prove that if at least one invariant subspace of an operator A has no 
complementary invariant subspace, then A cannot he an operator of a simple 
structure. 

6. Prove that if A is an operator of a simple structure, then its range and 
kernel have no nonzero vectors in common. 

7. Prove that if a subspace is invariant under an operator A, then it is also 
invariant under the operator a 0 E + a v A + . . . + a p Av. 
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71. The operator polynomial 

One of the most important ways of construct¬ 
ing invariant subspaces of a linear operator is by using polynomials 
with complex coefficients. 

Let 4 be some linear operator in a complex space X. Take a poly¬ 
nomial 

<p(z) = a 0 + ai2+ ••• +a p z p 

with complex coefficients and consider a linear operator 
9 (■<4) -(- ciiA -(- ... -]-api4 p . 

It is an operator in X and is called an operator polynomial or a poly¬ 
nomial in the operator A . 

Fix an operator A and construct the set of all operator polynomials 
in A. Since the set of all polynomials is a commutative ring, so is 
the set of all operator polynomials. In particular, it follows that 

<p (4) A = Aq> (4) 

for any polynomial cp (z). The commutativity of the ring of operator 
polynomials plays an exceptionally important part in all further studies. 

It is easy to show that the range T v of any operator polynomial 
cp (/l) is an invariant subspace for the operator A. Indeed, let x g 
6 7V This means that x = cp (.4) y for some y 6 X. By the per- 
mutability of A and <p (4) we have 

Ax = 49 (4) y = cp (4) (Ay). 

Hence the vector Ax is the result of applying <p (4) to the vector 
Ay 6 X, i.e. Ax £ T, p . 

The kernel A'«pof the operator polynomial qp (4) is also an invariant 
subspace for the operator 4. If x £ N 9 , then <p (4) x = 0, but then 

cp (4) (Ax) = 4 (cp (4) x) = 4 (0) = 0. 

It has already been noted earlier that there is at least one eigen¬ 
vector of the operator in any invariant subspace. Now it is possible to 
make a more precise statement. Namely, 

If an eigenvalue of an operator A is (is not) a root of a polynomial 
cp (z), then all eigenvectors of A corresponding to that eigenvalue are 
in the kernel (the range ) of the operator 9 (4). 

Indeed, let x be an eigenvector of an operator 4 corresponding to 
an eigenvalue X. In the exercises to Section 65 it was stressed that 
x is also an eigenvector for 9 (4) but corresponds to an eigenvalue 
9 (X). Hence 9 (4) x = 9 (X) x. If X is a root of 9 (z), then 9 (X) = 
— 0 and x is in the kernel of 9 (4). But if 9 (X)=^0, then 9 (X) x is a 
nonzero vector and x is in the range of 9 (4). 




226 


The Structure of a Linear Operator 


[Ch. 9 


We cannot prove that any invariant subspace of A is either the 
range or the kernel of some operator polynomial. This assertion is 
false in general, which is exemplified by the identity operator. We 
have <p (E) = <p (1) E for any polynomial q> (z), and therefore the 
operator <p (E) is either zero or nonsingular. Hence range and kernel 
are always trivial subspaces for <p ( E). But an invariant subspace of 
the operator E is any subspace. Nevertheless each invariant subspace 
of A has a definite relation to operator polynomials in A. We have 

Theorem 71.1. Let L be an invariant subspace of an operator A. If 
all eigenvalues of an operator induced on L are roots of a polynomial 
<p (z), then L is m the kernels of operators cp ft (A) for every sufficiently 
large positive integral power k. 

Proof. Denote by T' v T' t , . . . the ranges of the operators induced 
on L using operator polynomials <p ft (.4), with k = 1, 2, ... . 
The operator cp (A) is singular on L, since its kernel contains at least 
all eigenvectors of A lying in L. Therefore T[a L, dim T[ < dim L. 
The subspace T' x is invariant under A. If T' is nonzero, then by Theo¬ 
rem 70.1 the characteristic polynomial of the operator induced on T\ 
using A is a divisor of the characteristic polynomial of the operator 
induced on L using 4. Hence all eigenvalues of the operator induced 
on T[ are also roots of <p (z). But it again follows that T' t a T[, 
dim T\ < dim T[ and so on. The dimensions of T' v T' t , . . . cannot 
decrease without limit. Beginning with some k therefore these sub¬ 
spaces will remain zero, which means that the theorem is true. 

The above studies result in establishing an important fact concern¬ 
ing the existence of nontrivial invariant subspaces. 

Theorem 71.2. Any linear operator A in an m-dimensional complex 
space X has at least one invariant subspace of dimension m — 1. 

Proof. An operator A has at least one eigenvector x. Let it corre¬ 
spond to an eigenvalue X. By what has been proved the range 7\ 
of the operator A — XE is an invariant subspace of A. But since 
A —XE is singular, the subspace 7\ has a dimension not greater than 
m — 1. 

Consider now any subspace L of dimension m — 1 entirely contain¬ 
ing Tx • Any vector of X is transformed by A — XE into some vector 
of Tx- Any vector of L therefore again goes over into a vector of L. 
Thus L is a subspace invariant under A — XE and of course invariant 
under A. Thus the theorem is proved. 


Exercises 

1. Let A be an operator of differentiation in a finite 
dimensional real space of polynomials. What is the operator cp (A) for a 
polynomial (p (z) with real coefficients? 

2. Let <p (z) be the characteristic polynomial of an induced operator generat¬ 
ed by an operator A on an invariant subspace N. Prove that If is in the kernel 
of an operator <p* (A ) for some positive integer k. 
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3. Prove that if all eigenvalues of an operator A are roots of a polynomial 
9 ( 2 ), then 9 * (.4) = 0 for some positive integer k. 

4. Prove that the ring of operator polynomials generated by any operator 
has zero divisors. 

5. Prove that if A is an operator of a simple structure, then 9 (.4) is also an 
operator of a simple structure. Is the converse true? 

72. The triangular form 

Now we can solve the problem of reducing the 
matrix of an operator to one of the simplest forms, the so-called 
triangular form. 

Theorem 72.1. For any linear operator A in an m-dimensional space 
X there are invariant subspaces L p of dimension p , p = 0, 1, . . . 
. . ., m — 1 , m, such that 

Lq a Lid . . . d L m . 1 d L m . 

Proof. The existence of L 0 and L m is obvious. By what was proved 
earlier an operator A has an invariant subspace L m _i of dimension 
m — 1 . 

Consider on L m . 1 an induced operator. As any other operator 
on L m - X it has an invariant subspace L m _ 2 of dimension m — 2. 
But a subspace invariant under an induced operator is invariant un¬ 
der the generator A too. Thus the existence of L m _ 2 is proved. If we 
consider an induced operator on Z, m _ 2 , we can similarly establish 
the existence of Z, m _ a and so on. 

The theorem is interesting mainly because of its matrix interpreta¬ 
tion. Construct a basis e t , e 2 , . . ., e m of X using invariant subspaces 
L p . As a vector e 1 take any nonzero vector in L„ as a vector e s take 
any nonzero vector in L 2 that is not in L lt and in general take as a 
vector e p any nonzero vector in L v that is not in L p -i. Consider the 
matrix A e of A in that basis. Since ej is in Lj and Lj is invariant un¬ 
der A, the vector Ae t must be a linear combination of only vectors 
e lf e 2 , . . ., tj. In the expansion 

Aej = a l je l + a 2i e 2 + . . . -f a mj e„ ,, 

then the coefficient of e t must be zero for every i > /. Hence the mat¬ 
rix of A is of the form 


a ll 

a i2 

• a l, m-1 

Aim 

0 

a 22 

• a 2, m-1 

a 2 m 

0 

0 

• ^m- 1 . m-1 

^m-1, m 

[0 

0 

0 

^mm 


where a tJ = 0 for i > 7 . 

A matrix all of whose elements under (above) the principal diag¬ 
onal are zero is called a right (left) triangular matrix. In matrix terms 
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the result obtained implies that any square matrix is similar to a 
right triangular matrix. 

The triangular form of matrix is widely used in proving diverse 
facts concerning linear operators. This is mainly due to the fol¬ 
lowing property: 

If an operator A has in some basis a triangular matrix A e , then 
the diagonal elements of A e coincide with the eigenvalues of A even 
taking into account their multiplicities. 

Indeed, using the Laplace theorem we find that the characteristic 
polynomial of A e is equal to 

m 

det (\E — A e )= (X — Qjj)* 

which implies the validity of the above assertion. 

Much of the further theory of linear operators is devoted to improv¬ 
ing the result just obtained, that on reducing the matrix of an oper¬ 
ator to triangular form. The simplest form possible the matrix of an 
operator may have is the diagonal form. As we know, only the ma¬ 
trices of operators of a simple structure can be reduced to this form. 
However, the triangular form is not the simplest for operators of 
not a simple structure either. 

Exercises 

1. Prove that any square matrix is similar to a left 

triangular matrix. 

2. Prove that a set of left (or right) triangular matrices forms a ring. 

3. Prove that a set of nonsingular left (or right) triangular matrices forms 
a group. 

4. Let Xu X 2 , . . ., X m be the eigenvalues of an operator A written out in 
succession according to multiplicity. Prove that, taking into account their 
multiplicities, the eigenvalues of an operator q> (A) for any polynomial q> (*) 
are tp^). q> (X 2 ), . . ., q>(X„,). 

5. Prove that if all diagonal elements of a triangular m X m matrix A are 
zero, then A m = 0. 

6. Let a triangular matrix be similar to a diagonal matrix. Prove that the 
similarity transformation matrix may be chosen to be left (or right) triangular. 

73. A direct sum of operators 

A linear operator all of whose eigenvalues are 
equal is in a sense an exception. Nevertheless we shall show that it 
is such operators that any linear operator can be made up of. 

Let a space X be represented as a direct sum of subspaces L and 
M. We define some operator B on L and some operator C on M. For 
any vector x £ X there is a unique decomposition 

X = X[^ + Xftf t 


where x L C L and x M £ M. 


(73.1) 
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An operator A defined by 

Ax = Bx l + Cx m 

is called a direct sum of B and C. If one of the subspaces L and M is 
trivial, then the direct sum is also called trivial. 

It is easy to verify that A is a linear operator in X. We show that 
it can be represented only uniquely as a direct sum of operators de¬ 
fined on L and M. Indeed, for any vector x 6 L we have Ax = Bx. 
Similarly Ax = Cx for any x £ M. This means that B coincides 
with the induced operator A j L and C coincides with A \ M. 

Consider now an operator A in a space X. If X is decomposed in 
some way into a direct sum of subspaces L and M invariant under 
A , then the operator A itself can be decomposed as a direct sum. Indeed, 
construct A | L and A | M. On decomposing again a vector x £ X 
as a sum (73.1) we get 

Ax = (A | L) x L + (A | M) x M . 

In this case, by Theorem 70.1, the characteristic polynomial of A 
is equal to the product of the characteristic polynomials of A | L 
and A | M. 

The operator A can be decomposed as a direct sum using any op¬ 
erator polynomial 9 (A). Denote by N h the kernel of an operator 
<p h (A). This is a subspace invariant under A and it is obvious that 
N 1 cz iVjC .... We first prove that if N h = N h+l for some k, 
then N h = N p for every p > k. Indeed, take any vector x £ N p . 
Then <p p (A) x — 0. On writing this as q >' 1+1 (A) (q ) p_h-1 (A) x) =0 
we conclude that the vector (A) x 6 W h+1 . By virtue of 

N h = Nh+i the same vector is in N h . Consequently, 

<p h (A) (tpP -*- 1 (A) x) = epP " 1 (A) x = 0, 

i.e. the vector x £ N p _ v The validity of the above assertion can 
now be established by induction on p. 

The space X in which A is an operator is finite dimensional. There¬ 
fore the dimensions of subspaces N h cannot increase without lim¬ 
it. Let q be the smallest positive integer for which N q = N q+1 . 
Denote by T h the range of an operator q>* (A) and consider any vector 
x common for subspaces T q and N We have q> 9 (A) x = 0 and 
x = q> v (A) y for some vector y 6 X. It follows that <p 19 (A) y = 
= 0, i.e. y £ N tq . But by what has been proved N q = N 2q . There¬ 
fore y £ N q , i.e. x = <p ? (A) y = 0. 

Thus T q and N q have only a zero vector in common. In view of for¬ 
mula (56.3) this means that X = T q -f N q . Since T q and N q are 
invariant subspaces, the possibility of decomposing the operator is 
established. 
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As already noted earlier, all eigenvectors of A must be in T q and 
Na, with N q containing the eigenvectors that correspond to the eigen¬ 
values coinciding with some roots of a polynomial <p (z) and T q 
containing those for which the corresponding eigenvalues coincide 
with none of the roots of <p ( z). Since to every eigenvalue there cor¬ 
responds at least one eigenvector, it follows that: 

Each (none) of the roots of the characteristic polynomial of the op¬ 
erator induced on N q (T q ) is (is not) a root of (p ( 2 ). 

A final characteristic of decompositions of an operator as a direct 
sum using operator polynomials is furnished by 

Theorem 73.1. Let the characteristic polynomial f ( 2 ) of an operator 
A be decomposed as a product of polynomials <p ( 2 ) and ip ( 2 ) having no 
roots in common. Then A can be decomposed uniquely as a direct sum of 
operators B and C with the characteristic polynomials cp ( 2 ) and ip ( 2 ). 

Proof. Consider a decomposition of A as a direct sum obtained 
using a polynomial cp ( 2 ). Since the product of the characteristic poly¬ 
nomials of the operators defining the direct sum coincides with the 
characteristic polynomial / ( 2 ), the existence of at least one decompo¬ 
sition follows from the above studies. 

Suppose now that there is another decomposition of the space X 
as a direct sum of invariant subspaces N and T. The induced operator 
on N has a characteristic polynomial <p ( 2 ) and the operator on T 
has a polynomial tp ( 2 ). By Theorem 71.1, Ncz N h for every suf¬ 
ficiently large k, and therefore Ncz N , r The operator cp (A) is non¬ 
singular on T , and hence the set of the images of vectors from T 
relative to cp (A) coincides with T. But this means that T cz T h for 
every k. The subspaces N and T , as well as N q and T q in the direct sum, 
form the space X. Therefore inclusions Ncz N„ and Tcz T q arc pos¬ 
sible only when N = N q and T = T q . Thus the theorem is proved. 

Let A be au operator in an m-dimensional space X. We represent 
the characteristic polynomial / ( 2 ) of A as the canonical factoriza¬ 
tion 

/(2) = (2-X,)‘i (2-X*)*. ... (z-K)\ (73.2) 

where A 1 , X t , . . X r are mutually distinct eigenvalues and k 1 + 
+ & 2 +...-}- k r = m. Consider the polynomials 
(2 — X,)*l, (z — \ z ) h *, ..., (2 — kr) hr - 

They are divisors of the characteristic polynomial / ( 2 ) and no pair 
of them have roots in common. By Theorem 73.1, there are invariant 
subspaces R lt i? 2 , . . ., R r such that 

X—Rt 7? 2 -r ... + R r . 

The dimension of a subspace R t is equal to k t and the induced opera¬ 
tor on R t has a characteristic polynomial (2 — X ( ) h 1 . 

A subspace R t is called a root subspace of A corresponding to an 
eigenvalue X,. Vectors of a root subspace are called root vectors. It 
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follows from what has been said that any operator can be decom¬ 
posed as a direct sum of operators induced on root subspaces. 

A root subspace R t coincides with the kernel of the operator ((A — 
— XiE) hi ) q for some positive integer q. We show that in this case it 
is always possible to put <7 = 1. Consider operators (A — k,E) p for 
p = 1,2,... . Let pi be the smallest number for which the kernel 
of (A — \iE) Pl coincides with that of ( A — \iE) p i +1 . Then R t will 
coincide with the kernel of (A — X,£) Pi . Since the dimensions of 
the kernels of (A — X f £') p for p = 1, 2, . . . are monotonically in¬ 
creasing and the dimension of R t is equal to k,, we have 

Thus R, corresponding to an eigenvalue X, of multiplicity k t 
clearly coincides with the kernel of (A — A.,#)' 1 '. 

Theorem 73.2 (Cayley-Hamilton). If f ( z) is the characteristic poly¬ 
nomial of an operator A, then f (A) is a zero operator. 

Proof. Let us represent the characteristic polynomial as the canon¬ 
ical factorization (73.2). Since the operator polynomial / (A) con¬ 
tains the factor (A — k,E) kl and any polynomials in the same opera¬ 
tor are commutative, / (A) x, = 0 for any vector x, in R t . Now take 
a vector x and represent it as x — x x -(- x 2 + . . . + x T , where 
xi £ R/. It is now clear that / (A) x = 0, i.e. that / (A) is a zero op¬ 
erator. 

Of great interest is again the matrix interpretation of the results 
obtained. Compose a basis of the space as a successive combination of 
any bases of root subspaces R lt R a , . . ., R r . Root subspaces are in¬ 
variant and their direct sum coincides with X. Therefore the matrix 
A e of A in the basis has the so-called quasi-diagonal form 



Each A (( is a A; ( X k t matrix that is the matrix of the operator in¬ 
duced on a subspace R t . 


Exercises 

1. Can an operator of differentiation in a finite dimen¬ 
sional space of polynomials be decomposed as a nontrivial direct sum? 

2. Prove that a system of root vectors corresponding pairwise to distinct 
eigenvalues is linearly independent. 

3. Prove that if an operator A is nonsingular, then A _I = q> (A ) for some 
polynomial q> (z). 

4. An operator A is said to be ntlpotent if A p = 0 for some positive inte¬ 
ger p. Prove that an operator is nilpotent if and only if all its eigenvalues are 
zero. 

5. Let q> (z) be a polynomial of the lowest degree for which q>(A) = 0. 
Prove that q> (z) is a divisor of the characteristic polynomial of A. 
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74. The Jordan canonical form 

A further simplification of the matrix of an 
operator as compared with the quasi-diagonal form (73.3) can be ef¬ 
fected only by special construction of bases for each of the root sub¬ 
spaces. Root bases can of course be chosen so that each matrix A tl 
in (73.3) is triangular. This form of the matrix of an operator is not 
the simplest either, however. 

Let us study in more detail the structure of root subspaces. If 
x 6 R t , then (A — A. t E) k ‘ x = 0. But for every particular vector 
x the equation (A — A.(£’) m x = 0 may well hold also for m < k { . 
In particular, if x is an eigenvector corresponding to a multiple ei¬ 
genvalue A.j, then (A — k t E) x = 0, although k,^ 2. 

The height of a root vector x is the smallest nonnegative integer m 
such that (A — hfE)™ x = 0. 

All root vectors corresponding to an eigenvalue are of height 
not greater than the multiplicity of Recall, however, that in gen¬ 
eral the heights of root vectors and the multiplicities of eigenvalues 
are two distinct notions. Thus, for example, for an operator of a sim¬ 
ple structure there are no root vectors of height greater than unity at 
all, regardless of the multiplicities of the eigenvalues. 

Let R t be a root subspace corresponding to an eigenvalue of 
multiplicity k t . Denote by t the maximum height of root vectors in 
R t . It is clear thatt ^ k t . If a vector x is of height k, then a vector 
(A — \ t E) x will be of height k — 1. There are therefore root vectors 
of all heights from 0 to t in R t . 

For any k ^ t, denote by H h the collection of all vectors whose 
heights are at most k. It is easy to show that H h is a subspace in R t . 
If x, y £ H b , then (A — k t E) h x = (A — k ( E) h y = 0. But then for 
any a and (1 we have (A — X t E) k (ax -f- (}y) = 0, i.e. ax + (3y £ 
6 H h . It is, further, obvious that 

0 = H 0 cz Hi cr . . . cz H t-1 cz H t = R t . 

We denote the dimensions of these subspaces by m b , 0 = m 0 <Z 
< rrii < . . . < m, < m ( = k t . 

Let /j, . . ., / Pl be arbitrary linearly independent vectors from 
// 1 such that the direct sum of their span and H t .i is H t . It is clear 
that they are root vectors of height t, that = m, — m r _! and 
that no nonzero linear combination of the vectors / Pl is 

in Ht-i- Consider the collection of vectors 

/1 > • • •» / pi > 

(A-KE)f .. (A — X t E) f Pl , 

(A-W/i. .... (A-W/p r 


(A -K £)»-«/,. 


(A-W'fn. 


(74.1) 
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We show that they are linearly independent. Indeed, compose their 
linear combination and equate it to zero. On applying to both sides 
of the resulting equation the operator (A — Xj-Ey -1 we find that the 
linear combination of vectors fi, . . ., f Pl is sent by (A — X ( E)' -1 
to the zero vector, i.e. that it is a vector in Hence the coeffi¬ 

cients of these vectors must be zero. On applying now to the same 
equation the operator (A — X l E) t ~ i we similarly find that the coeffi¬ 
cients of the vectors in the'second row of (74.1) must be zero and so on. 

Notice that by virtue of the choice of vectors f lt . . ., f Pl no nonze¬ 
ro linear combination of vectors in the ith row of (74.1) is in H 
We supplement the vectors (A — X t E) f lt . . ., (A — \,E)f Pl 
with vectors / Pl+1 , . . ., f Pt of such that the entire collection 
is linearly independent and the direct sum of its span and H t . t 
is It is clear that they will be root vectors of height t — 1, 

that p t — nif-i — m t _ 2 and that no nonzero linear combination of the 
vectors is in H We again construct the collection of vectors 


{A — X t E) f Pl +1 , 


(A — \ t E) f Pt , 
(A-W* u 


(74.2) 


{A — 'k t E) t ~ 2 f p 1 + l, ..., - I PV 

With respect to the collection of vectors (A — X t E) /,, . . ., (.4 — 
— X t E) f pi , f Pl +i, . . ., / Pl we can prove all the facts proved with 
respect to the collection of vectors f v . . ., /pi. replacing of course 
t by t — 1. Going thus to subspaces ///_*, H t _ 3 , . . ., //j we 
obtain a linearly independent system of k t vectors lying in a root sub¬ 
space R t . Arrays of the type (74.1) and (74.2) end with an array con¬ 
taining a single row 

/pi_i+i> •••* fp t • (74.3) 

These vectors are in H lt i.e. are eigenvectors and p, = m 1 — m 0 . 

We arrange the arrays of the type (74.1) to (74.3) successively from 
left to right by aligning them by the last row and introducing a 
more compact notation for each vector. Then the following array re- 


suits: 





. . ., e pi , 




0<t- 1) 

• • • » C pl » 

p ( *-U 

e Pl+l» 

• • •* ^pj » 


^,(1) 

...» cpj , 

^Pl+1» 

e' l) 

...» Cp 2 » . 


(74.4) 


* <l) 

e P<-1+‘> 


Kp t' 


The vectors in the first row of (74.4) are of height t, the vectors in 
the next row are of height t — 1 and so on. The vectors of the last 
row are of height 1, i.e. the operator A — 'k i E sends them to the zero 
vector. Each column of (74.4) defines an invariant subspace of A — 
— X ( E and hence of the operator A. These subspaces are called cy¬ 
clic. The first p x cyclic subspaces are of dimension t, the next p 2 — 
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— Pi subspaces are of dimension t — 1 and so on. The last columns 
define one-dimensional cyclic subspaces. The entire root subspace R t 
is a direct sum of the p t cyclic subspaces. 

We write the matrix of the operator induced in a cyclic subspace. 
Suppose, for example, that vectors e|‘', el*', . . ., e<‘) are taken 

as a basis. Since 

(A-X t E)e\ v =0, (A-), t E)eT=eT .(A-A. l £)e< 1 ‘>=e< 1 «- 1 >, 

we have 

.4ei i) = X i e < 1 ,) , + <-}*>.4e|‘> = X,e|‘> + e|‘-'>. 

Hence the matrix of the induced operator has the following form: 

(X, 1 0 ... 0 0 ) 

0 X, 1 ... 0 0 


0 0 0 ... X, 1 
0 0 0 ... 0 X, J 

Matrices of this form are called Jordan canonical boxes. 

We shall now construct a basis of a space as a successive combina¬ 
tion of the bases of root subspaces R u R t , . . ., R r . As a basis of 
each root subspace R t we take vectors of the type (74.4) ordered in 
succession from bottom to top and from left to right. A space basis 
constructed in this way is called a root basis. 

In a root basis the matrix J of an operator A assumes the so-called 
Jordan canonical form. It is a quasi-diagonal matrix made up of Jor¬ 
dan boxes. First come Jordan boxes corresponding to an eigenvalue 
Xjl, in nonincreasing order of their sizes. Then, in the same order, 
come Jordan boxes corresponding to X 2 and so on. Thus 
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In general some of the Jordan boxes of smaller sizes may be 
lacking, of course. 

Specifying an operator in a vector space defines a class of similar 
matrices. The result obtained implies that any square matrix can be 
reduced by similarity transformation to a Jordan canonical form. 
It is clear that two square matrices of the same size are similar if 
and only if they have identical Jordan forms. Given a fixed basis 
therefore 

Two square matrices of the same size define the same operator in a com¬ 
plex space if and only if they have identical Jordan forms. 


Exercises 

1. Let x be a root vector of height v corresponding 
to an eigenvalue X ( of an operator A. Prove that if X ( is a root of multiplicity 
p of a polynomial <p ( 2 ), then a vector v = q> (.4) x is a root vector of height 
r = max {0, v — p) corresponding to the same eigenvalue What can be 
said about the vector v if X ( is not a root of <p ( 2 )? 

2. Let 2 be a nonzero vector and let q> ( 2 ) be a polynomial of the lowest degree 
such that q> (A) 2 = 0 . Prove that <p ( 2 ) is a divisor of the characteristic polyno¬ 
mial of A. 

3. Prove that any square matrix can be reduced to a unique Jordan canoni¬ 
cal form up to a permutation of Jordan boxes. 

4 . Prove that if a matrix is similar to the matrix J of ( 74 . 5 ), then it is simi¬ 
lar to /' as well. 

5. Prove that square matrices A and A' are the matrices of the same operator. 

6. Let / be a Jordan canonical matrix. What is the form of matrices Jv for 
positive integers pi 


75. The adjoint operator 

Now we proceed to study linear operators in 
a unitary space. Of course, all the results obtained earlier for opera¬ 
tors in a complex space hold in this case too. We shall study therefore 
only the additional properties of operators connected with the con¬ 
cept of orthogonality. In some cases we shall also consider operators 
from one unitary space into another. The principal part in our stud¬ 
ies will be played by the so-called adjoint operator. 

Let X and Y be two unitary spaces. An operator A* from Y to X 
is said to be adjoint to an operator A from X to Y if for any vectors 
x 6 X and y 6 Y 

(Ax, y) = (x, A*y). (75.1) 

Theorem 75.1. For any linear operator A there is an adjoint opera¬ 
tor A * which is unique. 

Proof. Choose in X some orthonormal basis e lt e 2 , . . ., e m . Re¬ 
call that for any vector r £ X there is an expansion 

m 

X= 2 (z. e h )e h . 

h=l 


(75.2) 



236 


The Structure of a Linear Operator 


[Ch. 9 


If A* exists, then, by this formula, for any vector y £Y 

A*y='Z(A*y, e h )e h 

h=l 

or considering (75.1) 

m _ m _ 

A*y= 2 («k- A*y)e h = 2 ( Ae h , y)e h = 2 (y. Ae h )e h . (75.3) 

h=* 1 1 

And this means that if A* exists, then it is unique. 

Now take (75.3) to be the definition of A*. It is easy to verify that 
the operator A* thus constructed is linear. It satisfies equation (75.1) 
too. Indeed, considering that the system e lt e 2 , . . ., e m is orthonor¬ 
mal and taking into account (75.2) and (75.3) we get for any vectors 
x £ X and y £ Y 

m m 

(Ax, y) = (A 2 (x, e h )e h , y) = 2 (*. e h )(Ae h , y), 

1 h** 1 

771 m 

(x, A*y)=( 2 (*, e h )e h , 2 (</. Ae h ) e h ) 

fe«=l 

m _ 77i 

= 2 (*. «h)(y, Ae h )= 2 (*» e k )(Ae h , y). 

h=l h=l 

Thus the theorem is proved. 

The adjoint operator A* is connected with A by definite relations. 
Note some of them: 

(A*)* = A, 

(A - 5)* = A* + B *, 

(aA)* = a A*, (75.4) 

(A5)* = B*A *, 

(A*)' 1 = (A- 1 )*. 

Here the bar over a means complex conjugation. All the relations 
can be proved according to the same scheme. We shall prove in detail 
therefore only the first and the last property. 

Consider an operator A and the adjoint operator A*. The adjoint 
operator of A* will in turn be an operator (A*)*. Now for any x 6 
6 X and y £ y we have 

(y, (A*)*x) = (A*y, x)=(x, A*y) = (Ax, y) = (y, Ax). 

The left-hand side is equal to the right-hand side for any vector y. 
Hence (A*)*x = Ax. But since this equation holds for any x, this 
means that (A*)* = A. 
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Suppose now that A is an operator in X and is nonsingular. We 
first prove that A* is also nonsingular. Let A*y = 0. According to 
(75.3) it follows that 

771 

2 ( y, Ae k )e h = 0. 
a=i 

Since the system of vectors e t , . . ., e m is a basis, 

( V , Ae h ) = 0 (75.5) 

for every k = 1, 2, . . ., m. Since A is nonsingular, it converts any 
basis again into a basis. But then the system of vectors Ae lt . . . 
. . ., Ae m is also a basis and it follows from (75.5) that y — 0. Thus 
the kernel of A* contains only a zero vector, i.e. A* is nonsingular. 
Take vectors i,j ( X. There are unique vectors u and v such that 

Au = x, A*v = y. 

We then find 

(x, (A _1 )*y) = (A~ l x, y) = (u, A*v) = (Au, v) = (a:, (A*) _1 y)- 

The left-hand side equals the right-hand side for any x. Hence 
(A _1 )*i/ = (A*) -1 y- Since y is arbitrary, this means that (A* 1 )* = 

= on- 1 - 

Many compatible properties of operators A and A* can be estab¬ 
lished from investigating the matrices of these operators. Choose an 
orthonormal basis e v e a , . . ., e m in X and an orthonormal basis 
9i» ? 2 , • ■ -, Qn in Y. If X and Y coincide, it will be assumed that so 
do their bases. Suppose that a matrix A qe with elements a tJ corre¬ 
sponds to A. Then 

71 

Aej=2 a tj < li• 

x=i 

From this and (75.2) we conclude that 

a U = (Aej, q,). (75.6) 

Also suppose that corresponding to A* in the same bases is a mat¬ 
rix A?, with elements a*,. By (75.6) 

cfj = (A*q )t e,). 

Comparing the elements a ( j and a*) and considering (75.1) we find 

a *i = (A*q Jt e,) = (e„ A*q } ) - {Ae„ q } ) = a Jt . 

This formula justifies the following definition: 

An m X n matrix A* with elements a*} is said to be the adjoint 
of an n X m matrix A with elements a t j if a*j = a fl for all i and /. 
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Thus, corresponding to adjoint operators in any orthonormal bases 
are adjoint matrices. Adjoint matrices clearly satisfy all relations 

(75.4) . An adjoint matrix A* is related to a matrix A by the opera¬ 
tions of transposition and complex conjugation. That is, 

A * = (A') = (A)*. (75.7) 

Here the bar means that all matrix elei. ents are replaced by their 
complex conjugates. 

The rank of an operator coincides with that of its matrix. Therefore 
it follows from (75.7) that operators A and A* have the same rank. 

Denote by N X, N* cz Y and T cz Y, T* a X re¬ 
spectively the kernels and ranges of operators A and A*. If x £ N, 
then Ax = 0 and ( x , A*y) = 0. This means that the range of A* is 
a subspace orthogonal to the kernel of A. Of course, the range of A 
is also orthogonal to the kernel of A *. From the equality of the di¬ 
mensions of the subspaces T and T* and from relations of the type 

(56.4) we conclude that 

X — N (B T*, Y = N* © T. (75.8) 


A basis y u y 2 , . . y m of a unitary space X is said to be dual 
to a basis x u x 2 , . . ., x m of the same space if 



0 if i¥=j, 
1 if i = j. 


A dual basis is not infrequently used to study compatible proper¬ 
ties of operators A and A* in the same space. We first prove that any 
basis has a dual which is unique. Let x„ x 2 , . . ., x m be a basis. For 
any j, a vector yj must be orthogonal to vectors Xj, . . ., x^.j 
and Xj± lt . . ., x m and hence to the span Lj constructed on those 
vectors. It follows that y } lies in a one-dimensional subspace Lj-. 
The normalization condition ( xj, yj) = 1 defines it uniquely. 

It is clear that a basis will be dual to itself if and only if it is ortho- 
normal. The duality relation of bases is symmetrical and therefore 
it makes sense speaking of a pair of mutually dual bases. Mutually 
dual bases are called biorthonormal. 

Theorem 75.2. If in some basis an operator A has a matrix J , then in 
the basis dual to the given one the adjoint operator A* has a matrix J*. 

Proof. Let A and A* have in an orthonormal basis e lt e 2 , . . . 
. ... e m corresponding matrices A, and A* and let A have a matrix 
J in a basis Xj, x 2 , . . ., x m . Denote by P a coordinate transformation 
matrix for a change from e lt e 2 , . . e m to x,, x 2 , . . ., x m . Then by 

(64.5) we have 

/ = P-'A e P. 
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Applying matrix conjugation to the left- and right-hand sides of this 
we find 


J* = P*A * (P-»)* 


or equivalently 

This relation shows that the adjoint operator A* has a matrix J* 
in a basis y t , y 2 , •••,2 /m for which the coordinate transformation 
matrix for a change from the basis e lt e 2 , . . ., e m is ( P~ 1 )*. Accord¬ 
ing to (63.3) the cooordinates of the vectors x lt x 2 , . . x m in 
e u e t , . . ., e m are column elements of the matrix P, and the coor¬ 
dinates of the vectors y lt y 2 , . . ., y m in e lt e t , . . e m are column 
elements of the matrix ( P _1 )*. Calculating pairwise scalar products 
of vectors of the basis x lt x 2 , . . x m and vectors of y lt y 2 , . . y m 
is equivalent to calculating the elements of the matrix P' 

But 


P> (P-1)* = P’ (p-i)' & P' (P~iy = ( p-ipy = E . 

Hence the basis y lt y 2 , . . y m is dual to x 1( x 2 , . . x m . 

Theorem 75.2 allows many consequences to be deduced. If, for 
example, / is a Jordan canonical matrix, then there are eigenvalues 
Xj, X 2 , . . ., \ m along its diagonal. But the eigenvalues of the ma¬ 
trix /* are Xj, X t , . . ., X m . Therefore the eigenvalues of A* are all 
the complex conjugates of the eigenvalues of A. If A is an operator 
of a simple structure, then Theorem 75.2 makes it possible to say 
that the adjoint operator A* has also a simple structure. Basis sys¬ 
tems of the eigenvectors of A and A* can be chosen so that they are 
biorthonormal and so on. 


Exercises 

1. Suppose the coordinates of the vectors of some basis 
of a Euclidean space in an orthonormal basis en e t , .... e m form the 
columns of a matrix A. Prove that the coordinates of the vectors of the 
dual basis in the same basis e If e 2 , . . ., fp, form the rows of a matrix i4 _l . 

2. How are the characteristic polynomials of operators A and A * related’ 

3. Prove that if some subspace is invariant under an operator A, then its 
orthogonal complements is invariant under A*. 

4. Prove that any eigenvector of an operator A corresponding to an eigen¬ 
value X is orthogonal to any eigenvector of an operator A * corresponding to an 
eigenvalue p =£ X. 

5. Prove that any root vector of an operator A corresponding to an eigenval¬ 
ue X is orthogonal to any root vector of an operator A * corresponding to an 
eigenvalue p =£ X. 
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76. The normal operator 

The existence of an orthonormal basis in a 
space and of a basis consisting of eigenvectors of a linear operator is 
of great importance in making diverse studies. Our immediate task 
therefore is to study a class of operators that have in a unitary space 
orthonormal basis systems consisting of eigenvectors. Such operators 
clearly exist. Among them, for example, are all scalar operators. 

Theorem 76.1 (Schur). For any linear operator in a unitary space 
there is an orthonormal basis in which the matrix of the operator is tri¬ 
angular. 

Proof. Consider, for example, the case of a right triangular matrix. 
By Theorem 72.1, for any operator A there are invariant subspaces 
L p , p = 1, 2, . . ., m, such that the dimension of L p is p and every 
subspace with a smaller index is in all subspaces with larger indices. 
The desired basis is constructed as follows. As a vector e x we take 
any normed vector of L x . As e t we take a normed vector of L t orthog¬ 
onal to L x and so on. As e m we take a normed vector of L m orthogonal 
to L m - X . The basis e x , e t , . . ., e m is orthonormal and, as noted 
in Section 72, the matrix of an operator in such a basis is right tri¬ 
angular. 

A linear operator A is said to be normal if it is commutative with 
its adjoint, i.e. 

AA *= A*A. 

We show that normal operators, and normal operators alone, have 
in a unitary space basis systems of orthonormal eigenvectors. 

The following remark is helpful in the study of these operators. If 
a triangular matrix is commutative with its adjoint, then it is di¬ 
agonal. Indeed, let, for example, an m X m matrix B be right trian¬ 
gular and let B*B = BB*. Denote by b t j elements of B. The condi¬ 
tion that the diagonal elements of the matrix B*B — BB* should be 
zero gives the following system of equations in nondiagonal ele¬ 
ments of B: 

— I *.2 I 2 — f *13 I 2 
I ^12 I 2 I ^23 I 2 
I *13 | 2 + I *23 I 2 


I m l 2 + I ^2m | 2 + I ^3m [ 2 + • • • + I ^m-l.m [ 2 — 0. 

Since the unique solution of this system is a zero solution, this 
proves the validity of the above remark. 

Theorem 76.2. For an operator in a unitary space to be normal it is 
necessary and sufficient that it should have a basis system of orthonormal 
eigenvectors. 


-|ft 14 | 2 -...- |bi m | 2 = 0, 
-|&2*| l -...-|&2m | 2 =0, 

-|&3*| 2 -...-|&3m| 2 = 0. 
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Proof. Let A be a normal operator. Choose by Theorem 76.1 an 
orthonormal basis such that the matrix of the operator is triangular. 
In the same basis the operator A* has corresponding to it an adjoint 
triangular matrix. Under the hypothesis A is normal, and therefore 
the matrices of A and A*, in the chosen basis, must be commutative. 
According to the above remark these matrices are diagonal. So we 
have constructed an orthonormal basis in which the matrix of the 
operator has a diagonal form. This means that that basis is made up 
entirely of eigenvectors of the operator. 

Suppose now that A has a basis system of orthonormal eigenvectors. 
Then in the basis made up of those vectors the matrix of A will be 
diagonal. But corresponding in the same basis to the operator A* 
is an adjoint matrix that is obviously also diagonal. Diagonal ma¬ 
trices are always commutative, and therefore so are A and A*. 

In proving the theorem we have shown that if an operator A is 
normal, then in a basis made up of orthonormal eigenvectors not only 
the matrix of A but also the matrix of A* is diagonal. This leads to 

Corollary. If A is a normal operator , then any orthonormal system 
of eigenvectors of A is an orthonormal system of eigenvectors of A*, 
and vice versa. 

Corollary. If A is a normal operator, then the eigenvalues of A and 
A* corresponding to the eigenvector they have in common are complex 
conjugate. 

Indeed, if Ax = Xx and A*x = pi, then by (75.1) for any normed 
eigenvector x we have 

X = (Xx, x) = (Ax, x) = (x, A*x) = (x, pi) = p. 

Of course, this fact holds for any operator A sharing eigenvectors 
with A*. The normality of A ensures that there are common vectors. 

The significance of normal operators in the general theory is ac¬ 
counted for by two circumstances. One is that they constitute one of 
the simplest classes of operators in a unitary space. The other is that 
investigation of an arbitrary operator not infrequently reduces to a 
study of normal operators. 


Exercises 

1. Let A be a linear operator and let a and fl be com¬ 
plex numbers equal in absolute value. Prove that aA + |L4* is a normal 
operator. 

2. Let A be a normal operator. Prove that for any polynomial q> (*) the op¬ 
erator q> (^4) is normal. 

3. Prove that for a normal operator any induced operator is normal. 

4. Prove that an operator A is normal if and only if for any invariant sub¬ 
space L its orthogonal complement L±- is also invariant. 

5. Let A be an operator of a simple structure in a complex space. Prove 
that A can always be made normal by an appropriate assignment of a scalar 
product in its space. 
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77. Unitary and Hermitian operators 

Among the normal operators the most widely 
used are operators of two types, unitary and Hermitian operators. 

A linear operator LJ is said to be unitary if its adjoint operator U* 
coincides with its inverse U' 1 , i.e. 

UU* = U*U = E. 

Theorem 77.1. A normal operator U is unitary if and only if all 
its eigenvalues are equal to unity in absolute value. 

Proof. Let U be a unitary operator. Take any of its eigenvalues 
X and the corresponding normed eigenvector x. We have 

1 = (x, x) (x, U*Ux)=(Ux, Ux) = (kx, Xx)=X*X(x, x) = \ X | 2 . 

Suppose now that all eigenvalues of the normal operator U are 
equal to unity in absolute value. Let x x , . . x m denote orthonor¬ 
mal eigenvectors of U and X lt . . ., X m its eigenvalues. Under the hy¬ 
pothesis, | X ( | = 1 for every i. Recall that for the adjoint operator 
U*, x lt . . ., x m remain eigenvectors but correspond to the eigenval¬ 
ues Xu . . ., X m . Take a vector x and expand it with respect to the 
eigenvectors of U 

x = 4- . . . + a m x m . 

Now 

U*Ux = U * ( Ux ) = U * (a,X,x, + ...+ a m X m x m ) 

= o^X^Xj +...-(- a m X m X m x m = a,X( + ... +a m x m x. 

Since x is an arbitrary vector, this means that U*U = E. Similarly 
for UU* = E. 

Theorem 77.2. An operator U is unitary if and only if for any two 
vectors their scalar product equals that of their images. 

Proof. Let U be a unitary operator. Then for any two vectors x 
and y we have 

(x, y) = (x, U*Uy) = (Ux, Uy). (77.1) 

Suppose now that given some operator U equations (77.1) hold for 
any vectors x and y. It follows that 

(x, (U*U — E)y) =0. 

Since x and y are arbitrary, this means that U* U = E. The operator 
U is nonsingular, for otherwise the equation U*U = E would-be im¬ 
possible. Hence the operator U _1 exists. Multiplying U*U = E by 
U on the left and by {7 -1 on the right we obtain another equation, 
UU* = E. So U is a unitary operator. 
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Corollary. An operator U is unitary if and only if either UU* = 
= E or U*U = E. 

Corollary. Any unitary operator carries any orthonormal system of 
vectors again into an orthonormal system. 

Corollary. If a linear operator U carries any orthonormal basis again 
into an orthonormal basis, then U is a unitary operator. 

Indeed, let i 1( . . x m be an orthonormal basis and let Ux t = 
= y t and y t , . . ., y m be also an orthonormal basis. Take two 
vectors, x and y. If 

m m 

1 = 2 CL,X„ y = 2 Mi. 

»=1 i=t 

then 

m 

{*. y) = 2 “iPi- 

t=i 

By the linearity of U 

m m 

Ux='Z<*tyh Uy=y i ^,,y i . 

1=1 t=i 

Therefore again 

( Ux , Uy) = 2 

i=i 

So equations (77.1) hold for any vectors x and y. 

Notice that we could define a unitary operator as an isometric op¬ 
erator, i.e. an operator preserving the lengths of all vectors. This fol¬ 
lows from Theorem 77.2 and from the easily verifiable relation 

/_ .a I * + Jt I*—I *—y l*+< I x+iy |* —i|x—ly | 4 

(z. y) =-4-. 

A linear operator H is said to be Hermitian or self-adjoint if it 
coincides with its adjoint, i.e. 

H = H*. 

Theorem 77.3. A normal operator H is Hermitian if and only if all 
its eigenvalues are real numbers. 

Proof. Let H be a Hermitian operator. Take any of its eigenvalues 
X and the cnresponding normed eigenvector x. We have 

X = (Xx, x) — (Hx, x) = ( x , H*x) = (x, Hx) = ( x , he) = X/ 

i.e. X is a real number. Suppose now that the normal operator H 
has real eigenvalues. Then in a basis made up of orthonormal eigen¬ 
vectors of H the matrices of H and H* will coincide. Hence so do the 
operators themselves, i.e. H is a Hermitian operator. 
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A Hermitian operator H is said to be nonnegative {positivedefinite) 
if for any (nonzero) vector x 

(Hx, *)>0 (> 0). 

Theorem 77.4. A Hermitian operator H is nonnegative {positive 
definite) if and only if all its eigenvalues are nonnegative {positive). 

Proof. Choose an orthonormal basis made up of eigenvectors 
ij, . . ., x m of a Hermitian operator H. Then it follows from the ex¬ 
pansion 

* = ?l^-l T • • ■ “t" 

for a vector x that 

(HX, *) = ?L 1 |£ l ! 2 +---+>lmllm| i . 

Hence, if all eigenvalues of a Hermitian operator are nonnegative 
(positive), then the operator itself is also nonnegative (positive def¬ 
inite). Putting x = X; we get 

{Hx it Xi) = k, 

for every t. Therefore all eigenvalues of a nonnegative (positive def¬ 
inite) operator are nonnegative (positive). 

It follows from the foregoing that a positive definite operator is a 
nonsingular nonnegative operator. Among all the Hermitian opera¬ 
tors nonnegative and positive definite operators play an especially 
important role. We note some of their properties. 

If H and S are positive definite operators, then the operator aH + 
-f P*S is positive definite for any nonnegative numbers a and p not both 
zero. 

Indeed, the operator aH + pS is Hermitian for any real numbers 
a and p. If. however, those numbers are nonnegative and are not both 
zero, then 

{(aH + PS) x, x) = a (Hx, x) + p (Sx, x) > 0 
for x # 0. 

If an operator H is positive definite, then H -1 is also a positive defi¬ 
nite operator. 

Indeed, since H = H*, we have H~ l = (H*)~ l — (Z/ -1 )*, i.e. 
the operator H~ l is Hermitian. The eigenvalues of H~ r are inverses 
of the eigenvalues of H. Therefore they are positive and H~ l is posi¬ 
tive definite. 

If H is positive definite and A is nonsingular operator, then A*HA 
and AHA* are positive definite operators. 

It is easy to verify that they are Hermitian. By the nonsingularity 
of A we have Ax 0 and A *x 0 for any i^O. Therefore 

(A*HAx, x) = (HAx, Ax) > 0, (AHA*x, x) = (IIA*x, A*x) > 0 
for i^O. In particular, it follows that for any nonsingular opera- 
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tor A the operators A*A and AA* are positive definite. But if A 
is a singular operator, then A* A and A A* are nonnegative. 

For any nonnegative operator H there is a nonnegative operator S such 
that 5- = H. 

Indeed, let A.,, . . ., A m be the eigenvalues of H and x lt . . ., x m 
the corresponding orthononnal eigenvectors. Then Hx, = A,x, for 

every i. Let S be defined by the equations Sx t = J/ A,!,-. The opera¬ 
tor S is nonnegative, since it has a basis system of orthonormal ei¬ 
genvectors x,, . . ., x m corresponding to nonnegative eigenvalues 

J/ A,,, . . ., Y A m . Besides, S 2 x, = Hx, = A,x,. Thus, S 2 and II 
coincide on the vectors of the basis x„ . . ., x m and therefore they 
do on all vectors, i.e. S 2 = //. 

A nonnegative operator S is said to be the principal square root 
of a nonnegative operator H if 5 s = II . 

It is important to stress that all eigenvectors of S and II coincide. 

Indeed, suppose A„ . . ., A r and V^A,, . . ., ]/ A r are the various ei¬ 
genvalues of // and S respectively. Denote by X, (F,), i — 1,2, ... 

. . ., r, the proper subspace of the operator II ( S) containing all 

eigenvectors corresponding to an eigenvalue A, (/ A,). The direct 
sums of the proper subspaces X u . . ., X r and F,, . . ., Y r coincide 
with the entire space. Therefore 

dim X, + . . . + dim X r = dim F, + • • • + dim F r . (77.2) 

It is clear that F ; c; X,- for every /, i.e. dim F,-^ dim X t . Hence 
(77.2) can hold only if for every i we have dim F, = dim X h i.e. 
F, = X t . 

So the eigenvalues and eigenvectors of S are uniquely defined by 
H. Since S is a Ilermitian operator, this means that the principal 
root of II can be only unique. 

Exercises 

1. Prove that the set of all unitary operators in a 
given unitary space forms a group relative to multiplication. 

2. Prove that the set of all Hermitian operators in a given unitary space 
forms a group relative to addition. 

3. Let an operator A be Hermitian and B positive definite. Prove that the 
eigenvalues of the operators BA and B~ l A are real. 

4. Prove that if A and B are positive definite eperators, then all eigenvalues 
of the operator BA are positive. 

5. Prove that if A ana B are commutative positive definite operators, then 
the operator BA is also positive definite. 

6 . Prove that if A is a positive definite operator in a unitary space, then the 
lunction (x, y) A = (Ax, y) satisfies all the scalar product axioms. 
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78. Operators A*A and AA * 

If A is an operator from a unitary space X 
to a unitary space Y , then an operator A*A is defined in X and an 
operator A A* in Y. These operators will play an important role in 
our further studies. Therefore we shall now proceed to investigate 
them. 

From the first and fourth properties of (75.4) it follows that .4*i4 
and AA* are Hermitian operators. Moreover, they are nonnegative, 
since for any vectors x 6 X and y £ Y we have 

(A*Ax, x) = (Ax, Ax)^ 0, 

(AA*y, y) = (A*y, A*y)^0. 

Therefore there is a nonnegative operator G in X and a nonnegative 
operator F in Y such that 

A* A = G 2 , AA* = F 2 . 

The operators G and F satisfying these relations are unique. 

Whatever the operator A, the operator A*A has an orthonormal 
system of eigenvectors x 2 , . . ., The operator A always car¬ 
ries that system into some orthogonal system. Indeed, let 

A*Ax h = f>lx h , p„>0 (78.1) 

for all k = 1, 2, . . ., m. Then 

( Ax,„ Ax t ) = (A*Ax h , x/) = p\(x h , x,) = 0 

for k l. In addition, for every k 

I Ax h | = p A , 

and therefore the vector Ax h is nonzero if and only if the eigenvalue 
p^ of A*A is nonzero. 

The nonzero vector Ax h is an eigenvector of A A* and corresponds 
to the eigenvalue p*. Indeed, by (78.1) 

.4.4* (Ax h ) = A (A*Ax h ) = A (p^x h ) = plAx h . 

Thus all nonzero eigenvalues of A*A are eigenvalues of .4.4*. 
The converse is also true of course. Therefore the nonzero eigenvalues 
of A*A and AA* always coincide. 

Eigenvalues of .4*.4 and .4.4* will be denoted by pp*. 

It may be assumed without loss of generality that 

P l>Pl> ■ ■ ■ >PS>0 

and that the other eigenvalues Ph are zero. It is obvious that the ei¬ 
genvalues of A* A and A A* differ only in the multiplicity of the ze- 
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ro eigenvalue. The multiplicity of A* A is (m — t) and that of A A* 
is (n — t). 

The principal square roots of the common eigenvalues of A*A 
and .4.4* are called singular (or principal) values of A. 

Using eigenvectors of A* A and A A * it is possible to construct such 
orthonormal bases in spaces X and Y with the aid of which it is easy 
to describe and investigate operators A and A*. Take as a basis in 
X an orthonormal system r,, . . ., x m of eigenvectors of .4*.4. It 
follows from (75.8) that vectors x x , . . x, form a basis in T* and 
that vectors x,+ x , • • ■> x m for® a basis in N. The orthonormal basis 
y lt . . ., y n in Y is constructed as follows. As y u . . ., y t we take the 
vectors obtained after the normalization of Ax lf . . ., Ax,. These 
vectors form a basis in T. As // <+1 , .... y„ we take any orthonormal 
basis in N*. It is clear that y x , . . ., y n are eigenvectors for A A* 
and form a basis in Y. Considering that | Ax k | = p h we now find 


| Pfel/A. 

| 0, Jc>t. 


k^t, 


(78.2) 


Multiplying these equations by A* and taking into account (78.1) 
we get 



p k x h , k^t, 
0, k>t. 


(78.3) 


The orthonormal bases in X and Y connected with A and A* by 
relations (78.2) and (78.3) are called singular bases. 

If X and Y are distinct spaces, then the matrix of A can be writ¬ 
ten in singular bases. Denote it by A. By (78.2) it is as follows: 

[P, 0 ] 

P2 


A = 


0 


Pr 

0 


(78.4) 


If X and Y coincide, then singular bases are not used as a rule to 
write the matrix of an operator. Relations (78.2) and (78.3) hold 
again, however. 


Exercises 

1. Prove that it is the kernels of operators A, A* A (A*, 
AA*) and the ranges of A, AA* (A*, A*A) that coincide. 

2. Prove that if dim X > dim Y (dim X < dim Y), then A*A (.4.4*) is 
a singular operator. 
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3. Prove that singular values remain unaffected by multiplication of an 
operator A by any unitary operators. 

4. Let A be an operator in a space X and let all its singular values be mutu¬ 
ally distinct. Prove that singular bases are uniquely defined up to multiplica¬ 
tion of each of the vectors by a number equal to unity in absolute value. 

5. Prove that the singular values of a normal operator coincide with the 
moduli of its eigenvalues. 

6. Prove that the singular values of an operator A- 1 are inverses of the sin¬ 
gular values of an operator A and that the singular bases of both operators coin¬ 
cide. 

7. Let A be an operator in an m-dimensional unitary space X. Denote by 
Xj, . . ., ^ its eigenvalues and by p,, . . p m its singular values. Prove that 

s pi. fi i**i-n p h - 

h=l h — i k=l k = l 

8. Prove that if lX ft | = p h for all k= 1, 2,. .m, then the operator is normal. 

79. Decomposition of 
an arbitrary operator 

One of the circumstances determining the 
significance of the unitary and the Hermitian operator is the possi¬ 
bility of using them to represent an arbitrary linear operator. 

Let A be a linear operator in a unitary space X. We show that it 
can always be represented as 

A = H l + t// 2 , (79.1) 

where H x and H t are Hermitian operators. Indeed, if this decomposi¬ 
tion exists, 

A* = H x - iH o. 

But then 

H X = ±(A + A*) % H 2 = ± (A -A*). 

It is these formulas that define decomposition (79.1). Since 
H i H z - H t H x = ±-(A*A- AA*)^ 

the normality of A implies that H x and H 2 are commutative, and 
vice versa. 

Let x,, . . ., x m be an orthonormal system of eigenvectors of the 
operator A*A. According to (78.2) there is an orthonormal system 
y lt . . y m of eigenvectors of A A* such that 

Ax h = Pnyh (79.2) 

for all k. Now let linear operators F and U be defined in a space X 
by the following equations on basis systems of vectors: 

Ux h = y h , Fy h = p h y h . 


(79.3) 
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Relations (79.2) and (79.3) imply that the following decomposition 
is obtained: 

A = FU. (79.4> 

Here F is a nonnegative Hermitian operator, since it has a basis or¬ 
thonormal system of eigenvectors y,, y 2 , . . ., y m and nonnegative 
eigenvalues p lt p 2 , . . ., p m . The operator U is unitary, since it car¬ 
ries the orthonormal system of vectors x lt x 2 , . . x m into the ortho¬ 
normal system y lt y 2 , . . ., y m . Note that (79.4) yields 

AA* = F a , (79.5> 

i.e. F is the principal square root of AA*. 

Decomposition (79.4) is called a polar factorization of an operator A. 
By virtue of the uniqueness of the principal root the operator F in a po¬ 
lar factorization will alwaysbeunique. The operator U will be unique 
only when the operator A is nonsingular. In that case U = F~ l A. 

Again there is a direct connection between the normality of an 
operator A and the commutativity of the components of its polar 
factorization. Indeed, let UF = FU for some operator A. Then 

A* A = U*F*FU = F*U*UF = F 2 , 

which together with (79.5) means that the operator A is normal. 

Suppose now that A is a normal operator, i.e. that A*A = AA*. 
By (79.4) A = FU. Hence A* = U*F. The normality condition of 
the operator leads to U*F 2 U = F 2 or 

F 2 U = UF 2 . 

Taking into account the second of the relations (79.3) we get 
F*(Uy k ) = pj (Uy h ) 

for all k — 1 , 2, . . m, i.e. Uy k are eigenvectors for the operator 
F 2 . As noted earlier, F 2 and F have the same eigenvectors. Therefore 

(FU) y k = F (Uy h ) = p* (Uy h ) 

for all k — 1 , 2, . . ., m. On the other hand, by the second of the 
relations (79.3) 

(UF) y h = U (Fy b ) = U (p h y h ) = p* (Uy h ). 

These equations show that the operators FU and UF coincide on 
the basis system of vectors y 2 , y„ . . ., y m . Hence UF = FU. 

Exercises 

1. Prove that if an operator is normal, then the eigen¬ 
values of the operator H l (H t ) of (79.1) are the real (imaginary) parts of the 
eigenvalues of the operator A. 
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2. Prove that if .4 is a normal operator, then the eigenvalues of the opera¬ 
tor F (the independent variables of the eigenvalues of the operator U) of (79.4) 
are the absolute values of the eigenvalues (the independent variables of the 
nonzero eigenvalues) of A . 

3. Prove that if the operator A is normal, then both operators in decomposi¬ 
tion (79.1) have the same eigenvectors as A. What can be said about the eigen¬ 
vectors of the components of decomposition (79.4)? 


80. Operators in the real space 

Additional difficulties arise in investigating 
linear operators in a real space. They are mainly due to the fact that 
not every linear operator in a real space has at least one eigenvector. 

Of course, if the characteristic polynomial of an operator in a re¬ 
al space has only real roots, then there is close similarity in theory. 
In fact only terminology changes. That is, the words “complex, uni¬ 
tary, Hermitian’’ are replaced respectively by “real, orthogonal, 
symmetric”. If, however, the characteristic polynomial has, in addi¬ 
tion, complex roots, then the study of such an operator becomes a 
more complicated matter. 

Let the real space R be given. Consider the set of all possible pairs 
{x ; y) of vectors x and y from R. We define operations on those pairs. 
It is assumed that 

(x; y) + (u; v) = (x + u; y + v) 

for any two pairs and that for any complex number £ -f it) and any 
pair (x-, y) 

(I -r til) (z; y) = (|x - r\y\ r\x + ly). 

It is easy to verify that the set of all pairs of vectors from R with the 
operations thus introduced is a complex space C. 

The constructed space C has the same dimension as the space R. 
Indeed, let e lt e 2 , . . ., e m be a basis in R. For any pair of vectors u 
and v from R we have 


u = a l e 1 -4- ... -(- a m e m , 1 
v = t • • • -r P m e m , > 

where a, and (5, are real numbers. But it follows that 
(u; v) = 2 (a„ i- i'P fc ) (e k ; 0). 

A= 1 


(80.1) 


(80.2) 


The system (e x \ 0), . . ., (e m ; 0) is linearly independent. Therefore 
the dimension of C is equal to m. 

For any basis e 1 , . . ., e m in R and any real numbers a lt . . ., a m 

m m 

2! ( a h + *0) ( e k'i 0) = ( 2j a k e h‘< 0)* 

h=l *=•! 
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Hence there is a 1-1 correspondence between all vectors u from R 
and all pairs of the form ( u ; 0) from C. Moreover, this correspondence 
is an isomorphism if restricted to the operations with real numbers. 

If all pairs of the form (u; 0) are identified with vectors u from R, 
then it follows from (80.1) and (80.2) that the space C may be con¬ 
sidered as a set of elements 

w = u -f iv, 

where u , v 6 R. It should be remembered, of course, that in fact the 
elements u and v are pairs (u; 0) and (y; 0) and that multiplication by 
a number i and addition are carried out according to the definitions 
introduced above. When v = 0 we obtain elements of R. It is natural 
to consider R to be some set of C. Elements of the form u + £0 
will be called real and elements u -j- iv and u — iv complex conjugate. 
The space C is called the complexification of the real space R. 

In solving various problems in a Euclidean space we can proceed 
in a similar way to obtain a unitary space. Consider the complexifi- 
cation C of a Euclidean space R. For any two vectors 

z = x — iy, w = u -p iv 

from C it is assumed by definition that 

(z, w) = ((*, u) -f ( y , v)) 4- i (( y , u) — (x, v)). 

It is not hard to establish that the space C with such a scalar product 
is unitary. The scalar product for any two vectors from R is pre¬ 
served. 

Let A be an operator in R. Construct a new operator A in C equal 
to A on R. To do this we set 

A (u + iv) = Au iAv. 

A A 

It is clear that A is a linear operator and that Au = Au for every 
vector u 6 R- 

The operator A is called the complexification of the operator A. 
Now instead of studying the operator A in the real space R it is 

possible to consider the operator A in the complex space C and inves¬ 
tigate it in R as a set of C. This device is most often used when some 
fact in the complex space has no analogue in the real space. 

Suppose that a real basis is given in C. Then in that basis the ma¬ 
trix of the complexification A is real and coincides with the matrix 
of the operator A in the same basis. It follows that the characteristic 

polynomial of A coincides with that of A and hence has real coeffi¬ 
cients. It is obvious that 
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If the characteristic polynomial of an operator A in the real space R 
has a real root, then that root is an eigenvalue of A and has at least one 
real eigenvector corresponding to it. 

Consider now some complex root A of the characteristic polynomial 

of A. It is an eigenvalue of A and has some eigenvector correspond¬ 
ing to it. Since the characteristic polynomial of A has real coeffi¬ 
cients, A will also have the complex conjugate eigenvalue!.. The op¬ 
erator A carries complex conjugate vectors into complex conjugate 
vectors. Therefore it follows from Aw = kw that ^4 it? = kw. Hence 

the complex conjugate eigenvalues of A have the corresponding com¬ 
plex conjugate vectors. 

If k ^ k, then the vectors w and w are linearly independent as ei¬ 
genvectors corresponding to distinct eigenvalues. 

Consider vectors x and y defined as follows in terms of w and w: 

x = j (w + w), y = — ir). (80.3) 

It is easy to verify that they are real. Moreover, it is not hard to see 
that if A = p + iv, then 

Ax = px — \y. Ay = xx -r py- 

Therefore the span in R constructed on vectors (80.3) is an invariant 
subspace of A. The matrix of the induced operator on that subspace 
in basis (80.3) is as follows: 



Hence the characteristic polynomial of the induced operator is (z — 
— p) 2 -f v 2 or equivalently z 2 — (k + X) z + AX. Note that in the 
invariant subspace constructed A has no eigenvector for v 0. 
Thus we have arrived at an important conclusion. Namely: 

If the characteristic polynomial of an operator A in the real space R 
has a complex (not real!) root, then that root has in R a corresponding 
two-dimensional invariant subspace of A containing no eigenvectors. 

This conclusion is as important for the study of operators in a 
real space as is the fact of the existence of at least one eigenvector for 
the study of operators in a complex space. Choosing in a suitable way 
bases in the space R we can reduce the matrix of an operator to a form 
resembling in a sense either the diagonal form or the triangular form 
or the Jordan canonical form. This method of investigating the op¬ 
erator is employed comparatively rarely, since real canonical forms 
lack many merits of complex canonical forms. It is much easier and 
more fruitful to investigate the complexification of an operator. 
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Exercises 

1. Prove that the range (the kernel) of an operator A 
is a complexification of the range (the kernel) of an operator A. 

2. Let a complexification A have a simple structure. Prove that a basis can 
be chosen in R such that the matrix of an operator A has a quasi-diagonal form 
with lxl and 2X2 matrices along the diagonal. 

3. Prove that in the real space R of dimension m any operator has an invari¬ 
ant subspace of dimension m — 1 or m — 2. 

4. What is the counterpart of Theorem 72.1 in a real space? 

5. Prove that any linear operator in a real space of odd dimension has at 
least one eigenvector. 


81. Matrices of a special form 

We have discussed some operators of a special 
form. It is natural to suggest that the matrices of those operators 
should also have some specificity. 

A square complex matrix V is said to be unitary if its adjoint U* 
coincides with its inverse C/ -1 , i.e. 

UU* = U*U = E. 

We recall that in an orthonormal basis the adjoint operator has a 
corresponding adjoint matrix. Hence the matrix of a unitary opera¬ 
tor in an orthonormal basis is unitary. 

Suppose that in a unitary space any two orthonormal bases are 
given. We construct a coordinate transformation matrix for a change 
from one of the bases to the other. According to (63.3) the matrix 
columns are made up of the coordinates of the vectors of the second 
basis relative to the first. But of the same form is also the matrix of 
a linear operator transforming the vectors of the first basis into those 
of the second. According to the second corollary of Theorem 77.2 
that operator is unitary. Therefore 

A coordinate transformation matrix for a change from an orthonormal 
basis to an orthonormal basis is unitary. 

We shall say that two matrices are unitarily similar if they are sim¬ 
ilar and the similarity transformation matrix is unitary. It follows 
from the properties of the unitary operator that any unitary matrix is 
unitarily similar to a diagonal matrix with diagonal elements equal 
to unity in absolute value. 

It is easy to write the relations defining the elements of a unitary 
matrix. Let U be an m X m matrix. We denote by its elements. 
Then it follows from UU* = E that 

3 - / 0 if i#;, 

S u lh u Jh -\ t if i=j _ 
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Similarly from U*U = E we get 

V “ _ / 0 if 

b L Uk,u k) - | 1 if . = j 

Thus the systems of row vectors and column vectors of any unitary 
matrix are orthonormal systems. 

A real unitary matrix U is called orthogonal. It is defined by the 
following relations: 


UU' = U'U = E. 

All properties of orthogonal matrices follow from those of unitary 
matrices. 

A square complex matrix H is said to be Hermitian or self-adjoint 
if it coincides with its adjoint, i.e. 

H = H*. 

Thus the matrix of a Hermitian operator in an orthonormal basis is 
Hermitian. 

It follows from the properties of the Hermitian operator that any 
Hermitian matrix is unitarily similar to a real diagonal matrix. If 
h,j are elements of the Hermitian matrix H, then 

h u = hji 

for all i and /. It follows in particular that the diagonal elements of 
any Hermitian matrix are real. 

A real Hermitian matrix H is called symmetric. It is defined by 
the following relation: 

H = H'. 

Note that any symmetric matrix is orthogonally similar to a real di¬ 
agonal matrix. 

A square matrix is said to be normal if it is commutative with its 
adjoint. 

According to this definition the matrix of a normal operator in an 
orthonormal basis is normal. Taking into account the properties of 
a normal operator it is easy to see that any complex normal matrix 
is unitarily similar to a diagonal matrix. 

Matrices of a special form play an important role in constructing 
various computational algorithms. Nevertheless we shall not be con¬ 
cerned with their detailed study. All the properties of these matrices 
are virtually a reflection of similar properties of corresponding opera¬ 
tors. 
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Exercises 

1. Prove that any complex matrix is unitarily sim¬ 
ilar to a triangular matrix. 

2. Let X_ be the eigenvalues of a matrix A, each eigenvalue 

repeated according to multiplicity. Prove that 

S I *■! l*<trMM). ( 81 . 1 ) 

j=i 

3. Prove that equality holds in (81.1) if and only if the matrix A is normal- 

4. Using the Binet-Cauchy formula prove that for any matrix A the princi¬ 
pal minors of the matrix A*A are nonnegative. 

5. Prove that the sum of the squares of the absolute values of all minors of 
a unitary matrix in any fixed rows and columns is equal to unity. 

6. Prove that any rectangular matrix A can be represented as A = QAS, 
where Q and S are unitary matrices and A is a diagonal matrix with nonnegative 
elements. 
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Metric Properties 
of an Operator 


82. The continuity and boundedness 
of an operator 

We have introduced the concept of linear oper¬ 
ator as some generalization of the notion of function. Assuming that 
in spaces a metric is defined, it is possible to draw an analogy with 
the boundedness of a function, the continuity of a function, etc. 
When studying these questions we shall always assume that the op¬ 
erator acts from an m-dimensional normed space X to an n-dimensio- 
nal normed space Y. If X does not coincide with F, then the norms in 
both spaces can be introduced independently of each other. 

An operator A from X to Y is said to be continuous at a point x 0 £ 
£ X if the condition x b -*-x 0 implies Ax h -*-Ax 0 for any sequence 
{;r h } in X. If the operator is continuous at each point of X, then it 
is said to be everywhere continuous or simply continuous. 

Theorem 82.1. A linear operator in arbitrary finite dimensional 
normed spaces is continuous. 

Proof. We take a vector x 0 £ X and choose any basis e l7 e 2 , . . . 
. . ., e m in X. We have 

*o = Si 0) ei+ 

Suppose x k -*-x 0 and 

T _t<M_ _i_ i tW,, 

x h — SI *1 T • • • + e m- 

By Theorem 53.1 convergence in the norm implies coordinate conver¬ 
gence. Therefore for every s. But 

Ax 0 = l < f ,) Ae 1 +...+t% ) Ae m 

and in addition 

Ax h =l\ h) Ae { + ... +t"Ae m . 

Now the convergence of -► V$ 0 ' for every s will imply the conver¬ 
gence of Ax k -^-Axq in the norm of Y. 

An operator A is said to be bounded if there is a constant M such 
that || Ax || ^ M || x || for any vector x £ X. 

Theorem 82.2. A linear operator in arbitrary finite dimensional 
normed spaces is bounded. 
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Proof. Suppose an operator A is not bounded in some case. Then 
there is a sequence (* ft ) of nonzero vectors such that 

|| Ax k 10*11** ||. 

Consider a sequence of vectors 

1 

yh ~ k || x* H 

It converges to zero, since 

On the other hand, 

This means that does not converge to zero, i.e. that A is 

not continuous at zero. This contradiction with Theorem 82.1 com¬ 
pletes the proof. 

It is natural to pose the question concerning the smallest of the 
constants M satisfying || Ax IK M || * || for all vectors x. Since 
the set of those constants is bounded below by zero, the smallest con¬ 
stant clearly exists. It is called the norm of the operator A and desig¬ 
nated || A ||. By definition the norm of an operator has the following 


two properties: 


(1) for any vector * in X 


WAX IK II a II • II* II, 

(82.1) 

(2) for every number e > 0 there is a vector * e 

6 X such that 

\\Ax e ||> (|| A ||- e) || * e ||. 

(82.2) 

We prove that 


IMII= sup || 4* || 

(82.3) 


or equivalently that i ) 1 1 


'*'-3'm 

(82.4) 


if of course dim X > 0. 

We take a vector * satisfying || * |K 1. Then it follows from (82.1) 
that 


\\Ax ||< || A || ||* IK II A ||. 


Consequently 


sup || Ax |K \\A ||. 
Il*|l«£l 


(82.5) 
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We further take any vector x e according to (82.2) and construct a 
vector 


Then 

II Ay t || = -jJL--1| Ax. \\>j^ (II A\\-t)\\x e \\ = \\A\\-z. 

Since || y. || = 1, we have 

sup \\Ax\\^\\Ay. ||>||i4|| — e. 

H*i«i 

By virtue of the arbitrariness of e we get 

sup ||Ar||>|M||. (82.6) 

Now from (82.5) and (82.6) we obtain relation (82.3) which was to 
be established. 

We shall soon show that the norm of an operator plays an exceptional¬ 
ly important role in introducing a metric in the space of linear operators. 
It is the explicit form (82.3) that will be essential. 


Exercises 

1. Prove that on a bounded closed set of vectors the 
8Upremum and infimum of the norms of the values of a linear operator are 
attained. 

2. Prove that a linear operator carries any bounded closed set again into 
a bounded closed set. 

3. Is the assertion of the preceding exercise true if the boundedness require¬ 
ment of a set is dropped? 

4. Prove that in (82.3) the supremum is attained on a set of vectors satisfying 
|| x J| = 1 provided dim X > 0. 

5. Let A be an operator in a space X. Prove that A is nonsingular if and 
only if there is a number m > 0 such that || Ax ||> m || x || for any 

88. The n o rm of an operator 

A set © xy of linear operators from X to 
Y is a finite dimensional vector space. If that space is real or com¬ 
plex, then it can be converted into a complete metric space by in¬ 
troducing a norm in it in some way. 

To introduce a norm in a space of linear operators the same methods 
can be used as those employed in any other vector space. Of most in¬ 
terest in this case, however, are only the norms in <o xy that are suf¬ 
ficiently closely related to those in X and Y. One of the most impor¬ 
tant classes of such norms is the class of the so-called compatible 
norms. 
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If for each operator of © XY 

II Ax KIM II . II X II 

for all x £ X, then the operator norm is said to be compatible with 
the vector norms in X and Y. 

The advantage of compatible norm is easy to see from the follow¬ 
ing example. Suppose that A, is an eigenvalue of an operator A in 
X and that x is the corresponding eigenvector. Then Ax=\x and 
therefore 

I* Ml * II = II** II = II Ax ||< \\A II • II X ||. 

Hence | A, K II A ||. So we have obtained a very important conclu¬ 
sion: 

The moduli of the eigenvalues of a linear operator do not exceed any 
of its compatible norms. 

This example shows that to obtain the best estimates it is desir¬ 
able that the smallest of the compatible norms should be used. It is 
clear that all compatible norms are bounded below by expression 
(82.3). If we show that this expression satisfies the norm axioms, 
then it will precisely be the smallest of the compatible norms. This 
justifies both the name of expression (82.3) and the notation used. 

It is obvious that for any operator A the expression || A || is 
nonnegative. If \\ A || = 0, i.e. if 

sup || Ax || = 0, 

then || Ax || = 0 for every vector x whose norm does not exceed uni¬ 
ty. But then, by the linearity of the operator. Ax = 0 for every 
x. Hence A = 0. For any operator A and any A. we have 

ll*>HI= sup f| \Ax || = | A. | sup \\Ax\\ = \ \\r\\A\\. 

And finally for any two operators A and B in © X y 

|| A + B\\ ■— sup || Ax-\-Bx |K sup (|| Ax || + || Bx ||) 
ll*ll«l ll*lls£l 

< sup || Ar || 4- sup || Bx || = || A || + || B ||. 

All these relations precisely mean that (82.3) is a norm in a space of 
linear operators. Norm (82.3) is called an operator norm subordinate 
to the vector norms in X and Y. 

A subordinate norm has a very important property relative to the 
operation of operator multiplication, too. Let A be an operator from X 
to Y and B an operator from Y to Z. As is known, this defines an 
operator BA. Considering the compatibility of subordinate norms 
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we find 

II BA || = sup || (54) x ||= sup || B (Ax) || 

|l*l|s£i II*|K1 

< sup (|| B || -\\Ax ||) = || B || sup \\Ax\\ = \\B\\-\\A\\. 

Thus any subordinate norm of an operator has the following four 
basic properties. For any operators A and B and any number X 

(1) \\A || >0 if A =£0; || 0 || = 0, 

(2) || XA || = | X | || A ||, 

(3) \\A + B || < || A || + || B ||, 1 ' } 

(4) || BA || < || B ||-1| A ||. 

To note a further property, for the identity operator E 

(5) II £ 11 = 1. 

This follows from (82.3), since Ex = x for any vector x. 

In the general case a subordinate norm of an operator depends 
both on the norm in X and on the norm in Y. If both spaces are uni¬ 
tary, then we may take as a norm in them the length of the vectors. 
The corresponding subordinate norm of the operator is called the 
spectral norm and designated ||- || a . So for any operator A from X to Y 

|| .4 ||* = sup (Ax, Ax). (83.2) 

<*,*)« i 

We investigate some properties of spectral norm. 

The spectral norm remains unaffected by the multiplication of an op¬ 
erator by any unitary operators. 

Let V and U be arbitrary unitary operators in X and Y respec¬ 
tively. Consider the operator B = UA V. We have 

||B||j— sup ( Bx , Bx) — sup ( UAVx , UAVx) 

(K.aOsgt (>,x)^l 

= sup (AVx, U*UAVx) = sup (AVx, AVx) 

(*. *)<1 (x, X)<1 

*= sup (AVx, AVx) = sup (Av, Av) = || A ||*. 

(Vx. Vx)<t (m)^1 

Assigning a spectral norm in the form (83.2) establishes its rela¬ 
tion to singular values of the operator A. Let x lt x t , . . ., x m be an 
orthonormal system of eigenvectors of the operator A*A and let 
pf. pl. •••. pm be its eigenvalues. It may be assumed without loss 
of generality that 

Pi>p 2 > • • •> Pm = 0. (83.3) 

We represent a vector x 6 X as 

x = + . . . + a m x m , 


(83.4) 



831 


The norm of an operator 


261 


then 

(x, x) = 2 I I 2 - 

t=l 

As noted in Section 78, the system x,, x it . . x m is carried by an 
operator A into an orthogonal system, with 

(Ax„ Ax,) = pf 

for every i. Hence 

(Ax, Ax) = 2 | a, | -p?, 

t=i 

which yields 

II A||* = sup 2l«il 4 P?- ( 83 - 5 ) 

m 1=1 

S ' a t ^ 1 

1-1 

It is clear that under (83.3) 

IMIIKPJ- 

But for the vector x 1 the right-hand side of (83.5) takes on the value pf. 
Therefore 

IIA ||f = pf. 

Thus 

The spectral norm of an operator A is equal to its maximum singu¬ 
lar value. 

We recall that for a normal operator A its singular values coincide 
with the moduli of its eigenvalues. Hence the spectral norm of a 
unitary operator is equal to unity and the spectral norm of a non¬ 
negative operator is equal to its largest eigenvalue. 

Exercises 

1. Prove that for any eigenvalue X of an operator A 
| X | ^ inf || Ah II 1 /*. 

h 

2. Let q> (z) be any polynomial with nonnegative coefficients. Prove that 

119(A) || < 9 (II A ||). 

3. Prove that || A || > || A - 1 || _1 for any nonsingular operator A. When does 
equality hold in the spectral norm case? 
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84. Matrix norms of an operator 

The spectral norm is virtually the only subor¬ 
dinate norm of an operator the calculation of which is not explicitly 
connected with bases. If, however, in spaces, in which operators 
are given, some bases are fixed, the possibility of introducing op¬ 
erator norms is greatly extended. 

So we again consider linear operators from a space X to a space Y. 
Suppose we fix a basis e lt e 2 , . . ., e m in X and a basis q lt q 2 , . . ., q„ 
in Y. Expanding a vector x 6 X with respect to a basis we get 

x = x 1 e 1 + . . . + x m e m . (84.1) 

Now it is possible to introduce a norm in X by formula (52.3), for 
example, or in some other way in terms of the coefficients of expan¬ 
sion (84.1). Similarly it is possible to introduce a norm in Y. 

The most common are norms of the form (52.4). Therefore we shall 
study operator norms subordinate to, and compatible with, those 
norms. Moreover, it will be assumed that norms of the same type 
are introduced in both X and Y. It is obvious that the corresponding 
norms of an operator A must be somehow related to the elements a t] 
of the matrix of the operator in the chosen basis. 

We first establish expressions for operator norms subordinate to 
the 1-norms and oo-norms of (52.4). We have 

n m 

Mill" sup II Ax II, = sup (212 a u x j 
ll*llis£i 11 * 11.^1 »=1 F-1 

< sup (2 2 i <*u ii xj i)< sup (2 i xj i 2 I *t) I) 

il*ll.«£i «=1 II*I||«S1 ;=1 *=1 

n n 

<(max 2l«ul)( su P II* Hi) = 2l«i/l* 

||*||,<1 i^m )=l 

We now show that for some vector x satisfying the condition || x ||j ^ 
^ 1, || Ax Ih coincides with the right-hand side of the relation ob¬ 
tained. 

Let the largest value at the right be reached when j = l. Then all 
the inequalities become equations, for x = e\, for example. So 

n 

Mill = max 2 I I- 

l<;<m i=l 

Similarly for the other norm: 

m 

|| A ||oc = sup || Ax ||» = sup ( max |2 Oijij |) 

IlflU^l ll*'loo«Sl l<i<n j=i 
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m m 

< sup (max y | a tJ | sup ((max 2 

llJclUsgl l«gi<nj=l llxlloo^t j=l 


I “tj I) 


m m 

X ( max I Xj D) = ( max 2 \a tJ |)( sup ||x|| M )= max £ |o,.|. 

lsSJsgm l«gi<nj=l ll*lloo<l lsgtsSn ;=1 


Suppose the largest value at the right is reached when i = l. We take 
a vector x with coordinates xj = | a tj \ / a l} , if a t] =^= 0, and with 
xj = 1, if a tJ = 0. It is not hard to verify that for that vector all 
the inequalities become equations. Hence 


IMII- = max ^ I a u U 

l<»<n}=t 

To find an operator norm subordinate to the 2-norms of (52.4) we 
proceed as follows. We introduce in X and Y a scalar product in a 
way similar to (32.1). Then the 2-norms of (52.4) will coincide with 
the length of the vector. Therefore the subordinate norm is nothing 
but the spectral norm of an operator corresponding to the given sca¬ 
lar product. The bases for the chosen scalar products become ortho¬ 
normal and therefore in these bases an adjoint operator will have 
a corresponding adjoint matrix. If we let A qe denote the matrix 
of an operator A, then it follows from the foregoing that 

The operator norm, subordinate to 2-norms is equal to the maximum 
singular value of A qe . 

The norms we have considered are some functions of the matrix 
of an operator. Not only subordinate but also compatible norms 
can we construct in this way. One of the most important compatible 
norms is the so-called Euclidean norm. It will be designated ||-||e. 
If in the chosen bases an operator A has a matrix A qe with elements 
a,j, then by definition 


imiib=(S s i a i) i s ) 1/2 * 

i=i “i 

The right-hand side of this is the norm in an n X m-dimensional 
space of linear operators. That the first three properties of (83.1) 
hold is therefore beyond doubt. Of great importance is the fact that 
for a Euclidean norm the fourth property of (83.1) is also valid. 
To prove this we use a Cauchy-Buniakowski-Schwarz inequality 
of tbe type (27.5). 

Let us consider vector spaces X, Y and Z of dimensions m, n and p 
respectively. Let A be an operator from X to Y and B an operator 
from Y to Z. By a l} and b lt we denote the elements of the matrices 
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of the operators in the chosen bases. We have 


1 = 1 J=l h— 1 1=1 J=1 fc=l 

<(2 2(2 i ^ i 2 )(2 k , i 2 )) 

i= i j=i ii=i t=i 


1/2 


= ((2 2 I b lk |*) & 2 I &n l J )) 1/2 = II £ He*II A He* 

1=1 h=l 1=1j=l 

In the general case a Euclidean norm is not subordinate. Its com¬ 
patibility with 2-norms can be proved in the same way as the prop¬ 
erty just considered. 

A direct check makes it possible to establish important formulas 
for the Euclidean norm. Namely, 

II A || E = tr (A* qe A qe ) - tr (A q ,A* t ). (84.2) 

We can now draw the following conclusions. 

An adjoint matrix in orthonormal bases has a corresponding ad¬ 
joint operator. We transform the chosen bases into orthonormal bases 
if we introduce in X and Y scalar products in a way similar to (32.1). 
Since the trace of a matrix is equal to the sum of its eigenvalues, it 
follows from (84.2) that 

The square of a Euclidean norm of an operator is equal to the sum 
of the squares of its singular values. 

If scalar products are introduced in X and Y, it is possible to speak 
of unitary operators. It is for these unitary operators that it is easy 
to show that 

A Euclidean norm is not affected by the multiplication of an operator 
by any unitary operators. 

Indeed, as noted in the exercises to Section 78, singular values re¬ 
main unaffected by the multiplication by unitary operators, and the 
Euclidean norm can be expressed only in terms of singular values. 

In most applications connected with norms, not so much an ex¬ 
plicit assignment of an operator norm is important as the fact that 
properties (83.1) hold. An operator norm can therefore be defined 
axiomatically in terms of its matrix. Choose in the spaces, in which 
the operators are given, some bases, then each operator will have a 
corresponding matrix. We assign to each matrix a number designat¬ 
ed as ||*1| and suppose that conditions (83.1) hold as axioms. A num¬ 
ber ||* || will be called a matrix norm. If now each operator is assigned 
the norm of its matrix, it is clear that this introduces a norm in 
the space of the operators. Conditions (83.1) obviously hold for the 
operators, too. The converse is also true. Given fixed bases, any op¬ 
erator norm generates a matrix norm. These matrix norms will be 
designated by similar symbols ||- || z , ||*||«, etc. It is obvious that 
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we may also require axiomatically that the norm should be compat¬ 
ible. 

The above examples show that it is practically feasible to assign 
an operator norm axiomatically in terms of a matrix norm. In what 
follows, speaking of matrix and operator norms we shall always as¬ 
sume that they are compatible and that conditions (83.1) hold. 

Exercises 

1. Prove that, given any norm, for a unit matrix 
II £ II > 1- (84.3) 

2. Let Xj, . . ., Xn, be the eigenvalues of a matrix A . Prove that 

mf II B~ l AB ||e= 5 I ^l s - 
B ft=i 

Compare this equation with (81.1). 


85. Operator equations 

One of the most important problems of algebra 
is that of solving linear algebraic equations. We have often met with 
this problem in the present course. We now consider it from the view¬ 
point of the theory of linear operators. 

Given system (60.2) with elements from a field P of real or complex 
numbers, take any m-dimensional space X and n-dimensional space Y 
over the same field P and fix some bases in them. Then relations (60.2) 
will be equivalent to a single matrix equation of the type (61.2) which 
in turn is equivalent to an operator equation 

Ax = y. (85.1) 

Here A is an operator from X to Y with the same matrix in the chosen 
bases as that of system (60.2). Vectors x 6 X and y 6 Y have in the 
chosen bases the coordinates (h, . . ., | m ) and (t)„ . . ., r) n ) re¬ 
spectively. 

Thus instead of a system of linear algebraic equations we may 
consider equations (85.1). The problem is to determine all vectors 
x 6 X satisfying (85.1) for a given operator A and a given vector 
y 6 Y. An equation of the form (85.1) is called an operator equation, 
a vector y is a right-hand side and a vector x is a solution. Of course, 
all the properties of a system of equations are automatically carried 
over to operator equations and vice versa. 

The Kronecker-Capelli theorem formulates a necessary and suf¬ 
ficient condition for a system to be solved in terms of the rank of a 
matrix. This is not very convenient since one is not allowed to notice 
the deep connection existing between systems and equations of other 
types. 
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Let X and Y be unitary spaces. Then an operator A* is defined. 
Equation (85.1) is called the basic nonhomogeneous equation, and the 
equation 

A*u = v 

is the adjoint nonhomogeneous equation. If the right-hand sides are 
zero, then the corresponding equations are called homogeneous. The 
following statement is true: 

Either the basic nonhomogeneous equation has a solution for any 
right-hand side or the adjoint homogeneous equation has at least one 
nonzero solution. 

Indeed, let r denote the rank of an operator A. The operator A* 
will have the same rank. Two cases' are possible: either r = n or 
r < n. In the former case the range of A is of dimension n and hence 
it coincides with Y. Therefore the basic nonhomogeneous equation 
must have a solution for any right-hand side. In the same case the 
nullity of the adjoint operator is equal to zero and therefore the kernel 
has no nonzero solutions, i.e. the adjoint homogeneous equation has 
no nonzero solutions. If r < n, then the range of A does not coincide 
with Y and the basic nonhomogeneous equation cannot have a solu¬ 
tion for any right-hand side. The kernel of the adjoint operator con¬ 
sists not only of a zero vector and therefore the adjoint homogeneous 
equation must have nonzero solutions. 

The above statement is of particular importance when X and Y 
coincide. Now the existence of a solution of the basic nonhomogeneous 
equation for any right-hand side implies the nonsingularity of the 
operator A. In this case therefore we have the so-called 

Fredholm Alternative. Either the basic nonhomogeneous equation 
always has a unique solution for any right-hand side or the adjoint homo¬ 
geneous equation has at least one nonzero solution. 

Fredholm Theorem. For a basic nonhomogeneous equation to be solv¬ 
able, it is necessary and sufficient that its right-hand side should be 
orthogonal to all solutions of the adjoint homogeneous equation. 

Proof. Let N* denote the kernel of an operator A* and T the range 
of A. If the basic nonhomogeneous equation is solvable, then its right- 
hand side y £ T. In view of (75.8) it follows that yJ_N*, i.e. that 
(y, u) = 0 for every vector u satisfying A *u = 0. Now let (y, u) — 0 
for the same vectors u. Then y _L N* and by (75.8) y 6 T. But this 
means that there is a vector x £ X such that Ax = y, i.e. that the 
basic nonhomogeneous equation is solvable. 

Exercises 

1. Prove that the equation A *A x = A *y is solvable. 

2. Prove that the equation (A*A)Px = {A*A)iy is solvable for any positive 
Integers p and q. 

3. Give a geometrical interpretation of the Fredholm alternative and theorem. 
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86. Psendosolntions and 

the pseodoinverse operator 

Prescribing arbitrarily an operator A and a 
right-hand side y may result in equation (85.1) having no solution. 
Obviously, this is only due to what exactly we mean by a solution of 
an equation. 

Take a vector ifX and consider a vector r = Ax — y called 
the discrepancy of the vector x. For x to be a solution of (85.1) it is 
necessary and sufficient that its discrepancy should be zero. In turn, 
for the discrepancy to be zero it is necessary and sufficient that its 
length should be zero. Thus all solutions of (85.1), and they alone, 
satisfy the equation 

| Ax — y I 1 = 0. 


Since the zero value of the length of the discrepancy is the small¬ 
est, the finding of solutions of equation (85.1) may be regarded as 
the problem of finding such vectors x for which the following expres¬ 
sion attains the smallest value: 

(1) 0 (x) - \Ax - y p. (86.1) 

The right-hand side of the expression is called the functional of dis¬ 
crepancy. Finding the vectors minimizing the functional of discrep¬ 
ancy makes sense also when no solution of (85.1) exists. This justi¬ 
fies the following definition: 

A pseudosolution (or generalized solution) of equation (85.1) is any 
vector x 6 X for which the functional of discrepancy attains its small¬ 
est value. The shortest pseudosolution is called a normal pseudosolu¬ 
tion. 

We show that a normal pseudosolution always exists and is unique. 
Fix in X and Y singular bases x l% . . ., x m and y lt . . ., y n . Let 

X *= 2 a h x h’ l/*™ 2 Ppl/p* (86.2) 

h=l p= 1 

Considering relations (78.2) we find that 

m n 

Ax — y= 2 P k^klfh— 2 Mp* 

A—1 p*=l 

It is assumed as before that the singular values p lt . . ., p, are non¬ 
zero and that the rest are zero. Since singular bases are orthonormal, 
we have 

*<>(*)= 2 IP*a*-P* l 2 + 2 I Pp I 2 - 

h =• I P-«+l 
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It is obvious that the smallest value of the functional of discrepancy 
is attained on those vectors x whose last m — t coordinates a h are 
arbitrary and whose first t coordinates are defined by 

= Pft/pfc- (86.3) 

The normal pseudosolution will be as follows: 

< 

*<>= s Ir**- (86 - 4) 

k =i 

We recall that vectorsx/ +1 , . . x m form the basis of the kernel N 
of the operator A. Therefore the set of all pseudosolutions is a plane 
in X whose direction subspace coincides with N and whose transla¬ 
tion vector coincides with any pseudosolution. A normal pseudosolu¬ 
tion is the only vector of that plane that is orthogonal to N. 

Using relations (78.2) and (78.3) it is easy to show that pseudosolu¬ 
tions, and they alone, satisfy 

A*Ax = A*y. (86.5) 

indeed, write vectors x and y as expansions (86.2). We have 

i t 

A*Ax = 2 pk&h^kt A*y — 2 PpPp^p* 

*=i p=-i 

It follows that solutions of equation (86.5) are only those vectors x 
whose first t coordinates a* are calculated according to (86.3) and 
whose last m — t coordinates are arbitrary. 

Thus, if the solvability of (85.1) is not guaranteed, then we can always 
replace the solution of the equation by the solution of (86.5). In addi¬ 
tion, a minimization of the functional of discrepancy for (85.1) is en¬ 
sured. 

The inverse operator plays an important part in carrying out vari¬ 
ous studies. However, it was defined only for the nonsingular opera¬ 
tor and we have no analogue as yet for the singular operator and for 
the operator from one space to another. This analogue can be con¬ 
structed on the basis of pseudosolutions. 

Suppose that A is an operator from X to Y. Then each vector 
y £Y can be assigned a unique vector x 0 6 X which is the normal 
pseudosolution of (85.1). This correspondence defines some operator 
A + from Y to X called the pseudoinverse (or generalized inverse) of 
A. So by definition 

x 0 = A + y (86.6) 

for any y £ Y. It is clear that if the operator A is nonsingular, then 
the pseudoinverse of A coincides with its inverse. We investigate the 
properties of the pseudoinverse. 
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Suppose along with (86.6) we have u 0 = A + v for some vector 
v £ Y. Consider the vector ay + (3i> for any a and p. If we take it as 
a right-hand side of (85.1), then the vector ax 0 + pu 0 will clearly 
satisfy a corresponding equation of the type (86.5) and therefore it 
will be a pseudosolution. Since x 0 and u 0 are orthogonal to the kernel 
of A, so is the vector ax 0 -+ {3u 0 . Hence it is the normal pseudosolu¬ 
tion. The linearity of the pseudoinverse operator is thus estab¬ 
lished. 

The properties of the pseudoinverse operator are easy to establish 
if we consider its action on the vectors of singular bases. By (86.4) 
we have 


Pi 1 **. 

0, k>t. 


k^.t, 


(86.7) 


It follows that 

The domain, kernel and range of the pseudoinverse operator and those 
of the adfoint operator coincide. 

Using (78.2), (78.3) and (86.7) it is possible to obtain various rela¬ 
tions connecting the operators A, A* and A + . We note some of them: 


(1) (A*)+ = (4+)*, 

(2) (A+)+ = A, 

(3) (AA+)* = AA+, (AA+Y = AA+, 

(4) (A+A)* = A+A, (A+A)* = A+A, 


(5) AA+A = A. 


These relations can be proved according to the same scheme. As an 
example, we therefore consider in more detail only the first and the 
third. 

Comparing (78.2) and (86.7) we take as an operator A the adjoint 
operator A*. Since (78.3) holds for this operator, we have 


(A*yx h 


I PftVn. 

\ 0, k > t. 


Now, proceeding from (86.7), we apply a relation similar to (78.3) 
to the operator (-4+)*. Then 


(^ + )*** 


I Pifyft, 

\ 0, k>t. 


Thus the operators (A*) + and (A+)* coincide on the basis z 1( . . . 
. . ., x m and therefore they are equal. 

Taking into account (78.2) and (86.7) we conclude that for the op¬ 
erator AA + 



k^.t, 

k>t. 


( 86 . 8 ) 
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This means that AA + has an orthonormal system of eigenvectors 
t/ lt . . ., y n and real eigenvalues 1 and 0, i.e. that it is Hermitian. 
This proves the first equation in the relations of group (3). The second 
is obvious from (86.8). 


Exercises 


1. What is the pseudoinverse of a zero operator? 

2. Let X and Y be distinct spaces. Write the matrix of the pseuaoinverse of 
an operator in singular bases and compare it with (78.4). 

3. Let U and V be unitary operators in X and Y respectively. Prove that 

( VAU)* = U*A*V*. 

4. Prove that there are operators K in X and L in Y such that 

A* = KA* = A*L. 

Describe the action of the operators K and L. 

5. Prove that the pseudoinverse of an operator is uniquely defined by the 
conditions 

AA*A = A, 

A* = KA * = A*L. 


6. Prove that all pseudosolutions, and they alone, are solutions of the equa¬ 
tion 


Ax — A A*y. 


7. Give a geometrical interpretation of pseudosolutions. 


87. Perturbation and nonsingularity 
of an operator 

We have repeatedly emphasized that small 
changes in the basis, coordinate vectors, matrix elements and the 
like may result in changes of many properties connected with the 
concept of linear dependence. This notion plays a decisive role in 
the entire theory of linear operators, so it is very important to study 
the influence of small changes in operators themselves on their proper¬ 
ties. 

As an auxiliary tool in solving diverse questions one has not infre¬ 
quently to use an operator almost equal to an identity operator. By 
this we shall mean an operator in a space X of the form E + A, 
where || A || < 1 for some norm. 

If X is any eigenvalue of an operator A , then 1 -f- X is an eigenvalue 
of the operator E + A. Since | X | ^ || A ||, by virtue of || A || < 1 
all eigenvalues of A are less than unity in absolute value. Hence all 
eigenvalues of E + A are nonzero and the operator is nonsingular. 

Thus if || A || < 1, there is an operator (E + A ) -1 . If, however, the 
operator E + A is singular, then || A || ^ 1 for any norm. 
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For any number a less than unity in absolute value we have the 
limiting relation 

(1 -f a) -1 = lima,,, 

p-»oo 

where 

a p = 2 (- a)\ 

h=0 

We show that a similar relation holds for the operator (E + -4) -1 
too, if || A || < 1. Consider a sequence {-4 P ) of operators 

A P =S (~A) h . 

h=0 

It is easy to verify that 

(E-' r A)A p = E-(-A) p+l , 
so 

||(£ + i4)i4,—f?||-IM P+l ||. (87.1) 

Formally this equation is true for p = — 1 as well, if it is assumed 
that A. 1 = 0. Also we have 

\\(E + A)A P -E\\ = \\(A P -(E + A)' 1 ) + A (A p - (E + A) -1 ) || 

>||| A P -(E + A)~* || — IM || • II A p — (Zs + A)~ l HI 

= (i — IMII) IMp— (E+ A)~ l ||. 

Now, considering (87.1), we obtain for p = —1 an estimate of the 
norm of the operator (E + AY', i.e. 

For any subordinate norm || E || = 1 and hence in this case 

H^ + ApiK-j-i-^. (87.2) 

For p > 0 we obtain an estimate of the deviation of the operator A p 
from the operator (E + A) -1 . Namely, 

|| A p -(E + A)~' IK • (87.3) 

By virtue of the condition || A || < 1 this means that {A p ) converges 
to (E + A)~ x . If A p is assumed to be an approximation to (E + A) -1 , 
then formula (87.3) gives an estimate of the accuracy of the approxi¬ 
mation. 
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Let A be any nonsingular operator. Consider an operator A + e A , 
where e A is an arbitrary operator. We shall call e A the perturbation 
of the operator A, and A + e A is a perturbed operator. We show under 
what conditions on the value of the perturbation norm the perturbed 
operator is nonsingular. We shall be concerned only with small 
values of the perturbation norm. 

The operator A is nonsingular and therefore there is an operator A -1 . 
Hence 

A+t A r= A{E- r A~ l t A ). 

It follows that A 4 e A is nonsingular if and only if so is the oper¬ 
ator E + A~ l t A . This condition clearly holds if 


\\A~'e a || <1 


for some norm. Of course it holds if || A' 1 || || e A II <" 1. 

Thus a perturbed operator is nonsingular for all perturbations satis¬ 
fying 

l|e A ||<ll^- 1 ir 1 . (87.4) 


When an operator A is perturbed to an amount of e A , the inverse 
operator A~ l acquires a perturbation equal to (A + e A ) _1 — A~ l . 
We denote by 


6i4 


II e A II * A -i = || U^-e A )-»-^-MI 
II A || ’ ||A-i|| 


(87.5) 


the values of relative perturbations of A and A~ l . When condition 
(87.4) holds, the operator E + A~ 1 e A is nonsingular and therefore 


A + e A ) _1 - = ((A + e A )-M - E) A~' 

= ((A-‘ (A + e A ))-i - E) A~ l = ({E + A-^ a )-i - E) A '*. 
From formula (87.3), for p = 0 we find that 


IIO‘4~i'e A ) -1 — A~ l ||< 


II II-II A -1 II 
1-11 A~'e A |! 


< M-MIMUaII 

^ 1 — II A -1 1 • || e A || • 


Now, using symbols (87.5), we obtain the following estimate: 




1 — v A 6i4 * 


(87.6) 


where 


v a = II A~ l ||> || A ||. (87.7) 

The number v A is called the condition number of the operator A. 
Although it depends on the choice of norm, it can never be very small. 
From 


E = A -1 A 
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and (84.3) we conclude that 

1<||£«||<||A-* IHMII=v a . 

Formula (87.6) shows that a small relative perturbation of an op¬ 
erator .4 results in a small relative perturbation of 4 _1 only when the 
condition number of A is not too large as compared with unity. This 
number will occur in other problems too. 

Suppose that given a nonsingular operator A we are to solve the 
operator equation 

Ax = y. (87.8) 

Consider the perturbed equation 


(4 + e A )x = t/ + e„. (87.9) 

If condition (87.4) holds, then the perturbed equation (87.9) and the 

original equation (87.8) will have unique solutions x and x. We eval¬ 
uate their difference. 

Along with (87.5) and (87.7) we introduce the corresponding sym¬ 
bols for relative perturbations in x and y, i.e. 


6x 


6y = 


II e„ II 

TFF 


We have 


x = A~ l y, x = {A + c A ) _1 (y + e„). 
From this we find 


x — x - ((£ + A'Ua)' 1 — E)A- l y + {E-\-A~ l z A )~ l A~ l t v 
and further 

||*-a:||<||(£ + A-‘e A )-‘-£|HI*ll 

+ || (£ +4-‘e A )-‘ |M| 4-‘||-lie,, ||. 


It is assumed that subordinate norm is used. Taking into account 
estimates (87.2) and (87.3), as well as the inequality lltf || ^ ||4 || X 
X || x ||, we get 


la: —ar||< ! 


A-' 


|e A 


l-ll 4-*|| 


eA 


II 4-1HMI 
l-ll 4-* INI e A II 


4-MMI4II- 


<- 


II II 
II 4 II 


1-|| 4~ 1 1|• || 4 ||' 


II e A 


■11*1 




I 4 || 


1—14- 1 n - n 4 


II e A II 

114 || 
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In symbols 

bx <- i-C l6A^ A+by )- (87 ' 10> 

This formula again gives the value of the condition number and 
again it is important from the viewpoint of stability that it should 
not be too large. 


Exercises 

1. Prove that a condition number expressed in terms 
of spectral norm is equal to the ratio of the maximum singular value to the 
minimum singular value. 

2. There are operators with the smallest condition number. What are these 
operators if spectral norm is used? 

3. Prove that multiplication of an operator by unitary operators leaves its 
condition number expressed in terms of spectral or Euclidean norm unchanged. 

4. Prove that for any nonsingular operators A and B 

\\B-'-A -' 1 [| _ M-Bjl 
II B~ l || " A \\A || * 

5. What causes the large instability of the system otyvectorB described in 
Section 22? Evaluate the condition number of the operator whose matrix columns 
coincide with the coordinates of{vectors (22.7). 

88. Stable solution of equations 

Formula (87.10) shows that for an operator 
almost equal to a singular operator large perturbations are possible 
in a solution even for small perturbations in the operator and the right- 
hand side. It may seem that this is due only to the fact that it is not 
always that a solution itself exists. However, the situation with 
finding pscudosolutions is similar. 

Indeed, let an operator be in a two-dimensional space. Suppose 
that in some orthonormal basis a system of linear algebraic equations 
of the following form corresponds to (85.1): 

\-x 1 0-Xj = 1, 

O-ij + 0-x 2 = 1. 

It is easy to find that the normal pseudosolution u 0 has the[following 
coordinates: 

u 0 = (1, 0). 

It may well be that the perturbed equation should leadjin the same 
basis to a system 

\’X 1 0-x 2 = 1, 

O-Xj + e-x 2 = 1, 

where the number e, although small, will nevertheless be'other than 
zero. Now the normal pseudosolution of the perturbed'equation 
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has the coordinates 

u= (1. e' 1 ). 

For small e the vectors u 0 and u ( 0 e) not only differ very much but are 
even almost orthogonal. 

If our equation has more than one pseudosolution, then in the 
general case small perturbations in the operator and the right-hand 
side will always result in large perturbations in the normal pseudosolu- 
tion. Nevertheless we show that despite the instability of many con¬ 
cepts connected with operator equations a normal pseudosolution can 
be stably determined. 

Let A be an operator from X to Y and suppose that equation (85.1) 
is to be solved. Similarly to the functional of discrepancy we consider 
the so-called regularizing functional 

<t> a (a:) = a | z |*+ | Ax-y\\ (88.1) 

where the number a ^ 0. It is clear that for a = 0 the functional 
coincides with the functional of discrepancy and attains its mini¬ 
mum on the pseudosolutions of (85.1). We find on what vectors the 
regularizing functional attains its minimum for a > 0. Using ex¬ 
pansions (86.2) we find 

<M*) - 2 («l«*l 2 +|p***-p*l 2 )+* 2 l«J 2 + 2 IM*. 

h—l fc=l+l p=-t+1 

It follows that for the minimum to be attained it is necessary to take 
the zero values of the last coordinates a (+1 , . . ., cc m and to mini¬ 
mize for each k ^ t the expression 

a I a k I* t | p*a, t — | 2 . 


This yields for k ^ t 


a* = 


a +Pft ’ 


Thus a minimum value of the regularizing functional (88.1) is 
attained for every a > 0 on a unique vector 


x a 


< 8 

S Phfth 
a + Pft 

h=i 


( 88 . 2 ) 


A comparison of formulas (86.4) and (88.2) makes it possible to 
establish some relations connecting z a and x 0 . We have for p, a > 0 

0<r J_g_U P a I P I 2 _ I P l 2 a 2 -j-2 | p |«ap» 

^ P J (a + P*)* pMo + p*)* 

P s (a + P 2 ) 2 P* 


f 
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hence 


where 




2 y _ Lgfc I 2 
V- 2j pj • 

h=t 


(88.3) 


We then find 


t 0 -x a = a 2 


Pk(« + Pj) 


from which we conclude that 

I x 0 — x a | < ay. 


(88.4) 


where 


Consequently 


V 2 =S 


lim x a = x 0 . 


Thus, for small values of a the vector x a may serve as an approxima¬ 
tion to the normal pseudosolution x 0 . 

We expand the vectors x a and x 0 with respect to singular bases in 
a way similar to (86.2). A direct check easily shows that x a satisfies 

(A*A + aE) x a = A*y. (88.5) 

For a > 0 the operator A*A + olE is positive definite and therefore 
there is an operator {A*A + aE)~ x , i.e. 

x a = (A*A + aE)-' A*y. (88.6) 

On x a the minimum value of functional (88.1) is attained and there¬ 
fore 0) o ( x a ) ^ <I> 0 (x 0 ). Taking into account (88.3) and (88.4) yields 

I Ax a — y | 2 ^| Ax 0 — y | 2 + a( | x 0 | 2 — | x a | 2 ) 

<| Ax 0 -y| 2 -l-2a 2 T ) 2 . (88.7) 

In addition (J) a (x a ) ^ <J> a (0), from which it follows that 

| x I C I y 1 
! I ^ a i/2 • 

Together with (88.6) this means that, given any operator A and any 
vector y, for a > 0 

| (A*A + a£)-* A*y \ <^t7T • (88.8) 
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In practice, when solving (85.1), the operator A and the right-hand 
side y are usually given inexactly and one has instead to consider 

the perturbed operator A and the right-hand side y. If in X and V 
one uses the length of vectors as a norm, then the spectral norm of 
the operators is subordinate to it. We shall therefore assume that 

\\y-y\\<lu. (88.9) 

Determining an approximate solution x a using the perturbed A 
and y leads to the following equation: 

( A*A + aE) x a = A*y. (88.10) 

From (88.5) and (88.10) we find 

(A*A + aE)l(x a — x a ) — A* (. Ax a — y) — A* (Ax a — y) 

= (A —A)* (Ax a — y) — A* ((A —A) x a — (y — y )). 

This means that the difference x a — x a is a solution of the equation 
with operator (A*A + aE) and the right-hand side of the form z = 
= u + A*v, where 

u = (4— A)* ( Ax a — y ), 
v = -((A-A)x a -(y-y)). 

Therefore 

x a — x a ={A*A a E)~ l u + {A*A -f aE)~ l A*v. 

Now we evaluate the norms of both summands in this equation. 

The eigenvalues of the operator (A*A -f aE) are at least a. Hence 

the eigenvalues of (A*A + a£) -1 are at most a -1 . For a positive 
definite operator its spectral norm coincides with its maximum eigen¬ 
value, i.e. 

II (A*A-{-aE)~ l IIj^cT 1 . 

Considering (88.7) and (88.9) we have 

|| (. A*A+ aE)-' u ||<|| (A*A + a E)~' || 2 1| u || 

<ir ii ^ - * I' <-£- (H Ax * - y H 2 + 2 a V) 1/2 . 

To evaluate the second summand we use formulas (88.3), (88.8) 
and (88.9). We find 

|| (*/+ aE)' 1 A*i’ ||<. Ill(7 a || x, || — l,). 
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So 

H *« - *« H <-%- (II Ax 0 - y II* ■t- 2aV) 1/2 + -Jjr («a II *o II + *y) ■ 

«v 

The total error of the computed pseudosolution x a is 

H — *o l|<||**-*« 11 + 11*0-*. II 

< ay + (|| >1* 0 - y || 2 + 2ahf) ,/t + -ij-(e A || x 0 II + e„). (88.11) 

The right-hand side of this is independent of the perturbed A and 

y. There is therefore an a such that the right-hand side attains its 
minimum. That value of a will ensure almost the best approxima¬ 
tion x a to the exact normal pseudosolution x 0 . 

Suppose that b a and e„ are values of an order of e and that e is 
sufficiently small. If the original equation (85.1) has a solution, then 
Ax % — y = 0. In this case the right-hand side of (88.11) is, according 
to the nature of its dependence on a and e, a function of the form 

a + e + ^T7T* 

For a = e 2/s it takes on a value of an order of e 2/s . If, however, the 
original equation has no solution, then Ax 0 -y#0. Now the right- 
hand side of (88.11) is a function of the form 



For a = e 1 /* it takes on a value of an order of e 1 /*. 

Thus, if thejinput data in (85.1) are prescribed up to an order of e, 
then the normal pseudosolution can be determined up to an order of 
e*/ 3 , if the original equation is solvable, and up to e 1 /* otherwise. 

The parameter a ensuring the required approximation x a cannot 

be found only from the perturbed A and y. This is mainly due to the 
fact that conditions (88.9) do not guarantee the continuity of the 
normal pseudosolution in a given range of the operator and the right- 
hand side. To determine the parameter a use is usually made of 
additional information about the solution. In some problems no guar¬ 
anteed closeness to the normal pseudosolution is required and it is 
considered sufficient to determine stably a minimum of the functional 
of discrepancy. In such problems it is a somewhat simpler matter to 
determine a. Despite the importance of these questions we shall 
not dwell on them, since they are beyond the scope of this book. 
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Exercises 

1. Prove that t) in estimate (88.3) is the norm of the 

normal solution of 

A *A (j4 *A )V*x = A *y. 

2. Prove that y in estimate (88.4) is the norm of the normal solution of 

(A*A)*x = A*y. 

3. Prove that the difference x a — zg satisfies 

(A*A + aE) (A*A — fiE) (x a — xg) = (()— a) A*y. 

4. Compare (88.11) and (87.10). What can be said about estimate (88.11) 
in the case of a nonsingular operator A ? 

5. To what accuracy can a normal pseudosolution be computed if d = 07 


89. Perturbation and eigenvalues 

In the general case the perturbation of an op¬ 
erator leads to changes in all of its eigenvalues and eigenvectors. 
Since the study of this relation is very complicated, we restrict our¬ 
selves to some illustrations. It is more convenient to describe this 
problem in terms of the matrices of operators rather than operators 
themselves. 

Let B be matrix of a simple structure and H a matrix such that 

H-'BH = A, (89.1) 

where A is a diagonal matrix of eigenvalues X lt X 2 , . . ., X m . Con¬ 
sider a perturbed matrix B 4- e B and some of its eigenvalues X. 
The matrix B - f e B — \E is singular and therefore so is the matrix 

H(B-t B -\E)H = (A - \E) + H~ l e B H. 

Two cases are possible: 

(1) X = Xj for some i, 

(2) X X ( for every i. 

In the second case the matrix A — XE is nonsingular, so 
(A - IE) + H- l t B H = (A - KE) (E + (A - \E)~ l H~ l e B H). 

The matrix that is the second factor is singular. This means that 
any norm of the matrix (A — XE)~ 1 H~ l t B H must at least be equal 
to unity. In particular 

|| (A — XE)~ l || 2 J>1. 

max |(X f —X)-‘| \\H-'\\ 2 \\t B \\ 2 \\H\\ 2 >i 

lsglsgn 


Hence 
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or 

min |X l -X|<||ff- 1 || 2 ||e B || 2 ||//|| 2 . 

lsgisgn 

In the first case this inequality also holds and therefore alway9 
l*i -* l<v H || e B ||, (89.2) 

at least for one value of i. Here 

= ||//- 4 1|, || ||, 

is the condition number of the matrix H expressed in terms of spec¬ 
tral norm. 

The relation obtained means that whatever the perturbation e B 
of the matrix B is, for any eigenvalue k of the perturbed matrix 
B + e B there is an eigenvalue k, of B such that we have inequality 
(89.2). Notice that we nowhere required that t B should be small. Rela¬ 
tion (89.2) may be interpreted somewhat differently. Namely: 

The eigenvalues of a perturbed matrix are in the region which is the 
union of all disks with centres at k t and of radius v H || e B || 2 . 

The columns of the matrix H are eigenvectors of the matrix B. 
It follows from (89.2) therefore that as a general measure of sensi¬ 
tivity of eigenvalues to the perturbation of a matrix we could ap¬ 
parently take the condition number of the matrix H of eigenvectors 
(rather than of the matrix B itself I). The matrix H satisfying (89.1) 
is not unique, since the eigenvectors are defined up to arbitrary fac¬ 
tors. It will be assumed that H is always chosen so that its value v „ 
is a minimum one. We recall that in any case v H ^ 1. 

If B is a normal matrix and, in particular, Hermitian or unitary, 
then we may take H to be a unitary matrix. Then v H = 1 and con¬ 
sequently 

l*i-* l<l|e*ll.. (89.3) 

We consider in somewhat greater detail the case of a Hermitian ma¬ 
trix B with Hermitian perturbation b b . Now we can show that: 

Every disk with centre at k t and of radius || e B [| 2 contains at least 
one eigenvalue of a perturbed matrix. 

Indeed, let us agree to consider a matrix B + z B as the “original” 
matrix and the matrix B = (B + e B ) — e B as a “perturbed" matrix 
with perturbation equal to — e B . Repeating the above calculations 
word for word we obtain a formula similar to (89.3) but with the 
eigenvalues of B and B + e B reversed. This means that for any eigen¬ 
value k t of the “perturbed" matrix B there must be at least one eigen¬ 
value k of the “original" matrix B + b b for which (89.3) holds. 

If the eigenvalues of B are simple, then for a sufficiently small 
perturbation e B all the disks become separated and then each disk 
will contain one and only one eigenvalue of the perturbed matrix. 
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Formula (89.3) shows that the eigenvalues of normal matrices 
possess a considerable stability to perturbations. In the general prob¬ 
lem of determining eigenvalues, however, this phenomenon is an 
exception rather than a rule. 

Consider as an example the case, an "extreme” case in a sense, 
when the matrix B consists of a single Jordan canonical box. We 
may agree to assume that all eigenvectors of such a matrix are col- 
linear, that the matrix consisting of eigenvectors is singular and 
that consequently its condition number equals “infinity”. So, let B 
be an m X m matrix of the form 

1 0 1 

x 0 1 

0 K 1 

l *oJ 

It is obvious that its characteristic polynomial is (X — X 0 ) m . 

Now take such a matrix of perturbation e„ in which only one ele¬ 
ment, that in position (m, 1), is nonzero and equal to e. The charac¬ 
teristic polynomial of the perturbed matrix is (X — X 0 ) m — e. The 
eigenvalues of the perturbed matrix are therefore a distance of | e I 1 /™ 
away from those of the original matrix. If, for example, m = 20, 
e = 10 -10 and X 0 is of an order of unity, then any practical stability 
is out of question. 

It is important to understand that the instability of eigenvalues is 
not necessarily due to the presence of multiple eigenvalues, nor is 
it of course to the presence of Jordan boxes. Let us consider a 20 X 
X 20 matrix B : 

20 20 0 1 

19 20 

18 20 

0 2 20 

l 1) 

It is a triangular matrix and therefore its eigenvalues are the diagonal 
elements. On the face of it they are sufficiently well separated and 
there seem to be no grounds to expect instability. But let us add per¬ 
turbation e to the zero element in position (20, 1). The free term of 
the characteristic polynomial will change by an amount of 20 w e. 
Since a product of eigenvalues is equal to a free term, the eigenvalues 
themselves must change very greatly. 
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Still more complicated questions arise in the study of the stabil¬ 
ity of eigenvectors. It is clear that if an eigenvalue A. of a matrix B 
is perturbation unstable, then the corresponding eigenvector x 
clearly cannot be stable, since B, A. and x are connected by the linear 
relations Bx = hx. 

It is important to note, however, that even if the eigenvalues re¬ 
main unaffected by perturbation, not only may the eigenvectors be 
unstable, but their number may also change. For example, the first 
of the matrices 


/ 2 0 0 \ / 2 0 0 \ 

010 > Ole 

\0 0 1 / \0 0 1 / 

has three linearly independent eigenvectors, and the second has two, 
although their eigenvalues are equal. Theoretically this phenomenon 
is due only to the presence of multiple eigenvalues in the original 
matrix. But under conditions of approximate assignment of a matrix 
it is hard, if not impossible, to decide which eigenvalues are to be con¬ 
sidered multiple and which simple. 

Questions concerning the stability of eigenvalues, eigenvectors 
and root vectors are among the most complicated in the sections of 
algebraJconnected^with computations. 

Exercises 

1. Let B be a matrix of a simple structure but with 
multiple eigenvalues. Prove that for any arbitrarily small e > 0 there is a 
perturbation e B satisfying || e B || < e such that the matrix B + e B is no 
longer of a simple structure. 

2. Let a matrix B have mutually distinct eigenvalues and let d > 0 be the 
smallest distance between them. Prove that there is a perturbation e B satis¬ 
fying 11 e a IIj > d such that the matrix B + e B is not of a simple structure. 

3. Now lets be aHermitian matrix. Prove that if aHermitian perturbation 
e B satisfies the condition || e B )|, < d/2, then the matrix B + e B has mutu¬ 
ally distinct eigenvalues. 

4. Finally, let B be a non-Hermitian matrix with mutually distinct eigen¬ 
values. Prove that there is a number r satisfying 0 < d such that the matrix 
B + e B is of a simple structure provided |) e B || 2 < r. 

5. Try to establish a more exact relation between the numbers r and d. 
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Bilinear and Quadratic Forms 

90. General properties of bilinear 
and quadratic forms 

Consider numerical functions q> (x, y) of two 
independent vector variables x and y of some vector space K n over a 
number field P, taking on values from P. A function <p ( x , y) is said 
to be a bilinear form if for any vectors x, y, z 6 K„ and any number 

a e P 

9 (x + z. y) = 9 (x, y) + 9 (2» y). 9 (ax, y) — crcp (x, y), 9Q ^ 
cp (x, y + z) = <p (x, y) + cp ( x , z), cp (x, ay) = acp (x, y). 1 ' 

The first two of the relations (90.1) imply the linearity of <p (x, y) 
in the first independent variable, the last two imply the linearity in 
the second independent variable. 

It is easy to verify that a sum of two bilinear forms, as well as a 
product of a bilinear form by a number, is again a bilinear form. 
Therefore the set of all bilinear forms over the same space K n as¬ 
suming values from the same number field P is a vector space. The 
“zero" of the given space is a bilinear form 0 (x, y) such that 0 (x, y) — 
= 0 for all x and y. The form 0 (x, y) is called a zero bilinear form. 
We have already encountered a function of this form. Comparing 
(27.1) and (90.1) it is easy to notice that a scalar product in a Euclid¬ 
ean space is a bilinear form. Recalling the important role played 
by the scalar product in the study of Euclidean spaces and of linear 
operators in them it may be suggested that a study of bilinear forms 
may turn out to be useful. 

A special place among the bilinear forms is occupied by symmetric 
and skew-symmetric bilinear forms. A bilinear form 9 (x, y) is said 
to be symmetric if for any vectors x,y 6 K n 

9 {*. y) = 9 (if. x). 

If, however, for any x, y 6 K n 

9 (x, y) = —9 (if. x), 

then the bilinear form is said to be skew-symmetric. 

Any skew-symmetric bilinear form cp (x, y) assumes a zero value 



284 


Bilinear and Quadratic Forms 


[Ch. 11 


when its independent variables coincide. Indeed, since <p (x, x) = 
= —cp (x, x), we have cp (x, x) = 0. Somewhat unexpected is an¬ 
other fact connected with the values of a symmetric bilinear form 
when its independent variables coincide. Namely, any symmetric 
bilinear form cp (x, y) is uniquely defined by its values when its inde¬ 
pendent variables coincide. Indeed, letx and y be any vectors from K n . 
Taking into account the symmetry of qp (x, y), we have 

cp (x -f y, x + y) = cp (x, x) + cp (y, y) + 2cp (x, y), (90.2) 

whence 

<p(x, y) = 4f{<p(x+y, x y) cp (x, x) — tp (y, y)}. (90.3) 

This formula proves the validity of the above assertion, since the 
right-hand side of the relation is a symmetric bilinear form. 

A bilinear form is uniquely decomposable into a sum of a sym¬ 
metric and a skew-symmetric bilinear form. In explicit form 

<P(*. y)=4'^( ;r ’ ») + »(»• *))+T^(*’ *)}• (90- 4 ) 

It is easy to verify that the first two terms at the right yield a sym¬ 
metric bilinear form and the last two yield a skew-symmetric form. 
Assuming the existence of some other decomposition we shall have 
to conclude, on substituting equal independent variables, that the 
symmetric part of the decomposition is uniquely defined and that 
hence so is the decomposition as a whole. 

If the bilinear form is not symmetric, then instead of (90.2) we 
shall have 

cp (x + y, x + y) = cp (x, x) + cp {y, y) + cp (x, y) + cp (y, x). 
Consequently 

4-{<p(*. y) + <p(y. *)} 

=-j{ <?) 0 r + y ’ x + y) — <v(x, x) — cp (y, y)}. (90.5) 

Comparing this relation with (90.3) we conclude that for a nonsym- 
metric bilinear form its symmetric part is uniquely defined by the 
values of the form when its independent variables coincide. 

Along with bilinear forms we shall consider the so-called quadratic 
forms. Let q> (x, y) be a bilinear form in a space K n . A quadratic 
form is a numerical function cp (x, x) of a single independent vector 
variable x £ K n obtained from cp (x, y) by replacing the vector y 
with x. 

In general it is impossible to reconstruct uniquely from a quadrat¬ 
ic form the bilinear form that has generated it. But, as formula (90.3) 
implies, there is one and only one symmetric bilinear form from which 
the original quadratic form can be obtained. That bilinear form is 
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called polar relative to a given quadratic form. The set of all bilinear 
forms generating the same quadratic form can be obtained by adding 
the polar bilinear form and an arbitrary skew-symmetric form. In 
using bilinear forms for the study of the properties of quadratic forms 
it suffices therefore to consider only symmetric bilinear forms. 

The impossibility of reconstructing a bilinear form from a qua¬ 
dratic form is explained by the fact that the quadratic form gives 
no information about the skew-symmetric part of any bilinear form. 

Lemma 90.1. Skew-symmetric bilinear forms, and these forms alone , 
assume zero values for all coinciding independent variables. 

Proof. We have already noted that if 9 ( x, y) is skew-symmetric, 
then 9 (x, x) = 0 for every x. If, however, 9 (x, x) = 0 for every x, 
then from (90.5) it follows that 9 (x, y) + 9 (y, x) = 0 for all 
vectors x and y, i.e. the bilinear form 9 (x, y) is skew-symmetric. 

A comparison of the properties of a scalar product and relations 
(90.1) shows that in a unitary space strictly speaking a scalar product 
is not a bilinear form. In a complex space, closely related to a scalar 
product are Hermitian bilinear forms. A numerical function 9 (x, y) 
is said to be a Hermitian bilinear form if for any vectors x, y, z 6 K n 
and any number a from the complex field P 

9 (x r 2 , y) = 9 (x, y) + 9 ( 2 , y), 9 (ax, y) = <29 (x, y), 

9 (x, y + z) = 9 (*. y) + 9 (*, 2 ). 9 (*. ay) = 09 (x, y). 

Here the bar stands for complex conjugation. 

Again a sum of two Hermitian bilinear forms, as well as a product 
of a Hermitian bilinear form by a number, is a Hermitian bilinear 
form. The set of all Hermitian bilinear forms over the complex space 
assuming complex values is therefore a complex vector space. 

A Hermitian bilinear form is said to be Hermitian-symmetric if for 
any vectors x, y £ K n 

9 (x, y) = 9(y- *)• 

If for any x, y 6 K„ _ 

9 (x, y) = —<p(y, x), 

then the form is called skew-Hermitian. On coinciding vectors the 
skew-Hermitian form assumes pure imaginary values and the Her¬ 
mitian-symmetric form assumes real values. Now any Hermitian 
bilinear form is uniquely defined by its values when its independent 
variables coincide. But instead of (90.3) the following relation is 
true: 

9(*. * + y)-9(*—y. x ~y) 

+19 (x + iy, x-\-iy) — 19 (x — iy, x — iy)}. (90.6) 
From this it follows in particular that 
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Of the Hermitian bilinear forms only the zero form assumes zero values 
when all its independent variables coincide. 

In this case, too, a Hermitian bilinear form can be uniquely re¬ 
presented as a sum of a Hermitian-symmetric and a skew-Hermitian 
form, with 

<p(*» y) = -j {<p (*• y)- L( p(y - *)}+ 4-{<p ( x < y)— <p(y> *)}• (90- 7 ) 

The proofs of the facts for Hermitian forms are much the same as the 
corresponding proofs for bilinear forms. 

A quadratic Hermitian form is a numerical function 9 (x, x) of a 
single independent vector variable x 6 K„ obtained from a Hermitian 
bilinear function 9 ( x , y) by replacing the vector y with x. Unlike 
quadratic forms, a Hermitian quadratic form allows a unique recon¬ 
struction of the Hermitian bilinear form that generates it. The re¬ 
construction is carried out according to formula ( 90 . 6 ), and the cor¬ 
responding bilinear form is also called polar relative to the original 
quadratic form. 

The possibility of reconstructing uniquely a Hermitian bilinear form 
from the Hermitian quadratic form generated by it is due to a close 
relation of Hermitian-symmetric to skew-Hermitian bilinear forms. 

Lemma 90 . 2 . If 9 (x, y) is a Hermitian-symmetric {skew-Hermiti¬ 
an) bilinear form, then 9 (x, y) = £9 (x, y) is a skew-Hermitian ( Her¬ 
mitian-symmetric) bilinear form. 

Proof. Suppose, for example, 9 (x, y) is Hermitian-symmetric. 
Then for all vectors x and y we have 

y)=iy{x, y) = 9 (ix, y) = q>(y, ix) = — iy {y, x)=—^{y, x), 

i.e. 4> (x, y) is skew-Hermitian. The case of a skew-Hermitian form 
9 (x, y) can be considered in a similar way. 

In what follows we shall more often be concerned with Hermitian 
quadratic forms generated by Hermitian-symmetric bilinear forms. 

Lemma 90 . 3 . Of the Hermitian bilinear forms only symmetric forms 
generate real Hermitian quadratic forms. 

Proof. As already noted earlier, Hermitian-symmetric forms as¬ 
sume real values when their independent variables coincide. Suppose 
now that a Hermitian quadratic form 9 (x, x) assume only real val¬ 
ues. According to ( 90 . 6 ), for a polar bilinear form 9 (x, y) we have 

9(y, z) = {9 (y + x, y + x)— y(y — x, y — x) 

+ iq>(y+ix, y + ix) — iy(y—ix, y — ix)) 

= -4 - {9 (a: + £/, x + y) — 9(x — y, x— y) + £9 (x — iy, x—iy) 

— iy(x + iy, x-My)} = -^-{9(x + y, x+ y) — 9 (x — y, x — y) 

+ i9(x-}- iy, x+ iy) — iq>(x — iy, x — iy))= 9(x, y). 



90] 


General properties 


287 


Corollary. Of the llermitian bilinear forms only skew-symmetric 
forms generate pure imaginary Hermitian quadratic forms. 

Corollary. No Hermitian nonsymmetnc bilinear form can generate 
a real Hermitian quadratic form. 

As follows from the properties of linearity of bilinear and Hermi- 
tian bilinear forms in each independent variable, 9 (0, 0) = 0 for 
any quadratic form <p (x, x). In the general case, however, there may 
also be nonzero vectors x such that 9 (x, x) = 0 . These vectors will 
be called isotropic. The concept of isotropy is connected only with 
quadratic form. Therefore vectors isotropic for one quadratic form 
may be nonisotropic for another and vice versa. In particular, Lem¬ 
ma 90.1 implies that for a quadratic form generated by a skew-sym¬ 
metric bilinear form all vectors in K n , except the zero vector, are 
isotropic. 

Of the ordinary and Hermitian real forms the most widely used 
are the forms that assume values of the same sign for all independent 
vector variables. A real quadratic form 9 (x, x) is said to be positive 
definite if <p (x, x) >• 0 for every x # 0 . The form is said to be non¬ 
negative if <p (x, x) ^ 0 for every x ^ 0 . Similar definitions can be 
obtained for nonpositive and negative definite quadratic forms. 

It is only positive definite and negative definite quadratic forms 
as a rule that are called forms of constant signs. But sometimes this 
term is also applied to nonnegative and nonpositive quadratic forms. 
To avoid confusion, whenever necessary positive definite and nega¬ 
tive definite quadratic forms will be called forms strictly of constant 
signs. 

If a quadratic form is a form of constant signs, then the ordinary 
or Hermitian bilinear form generating it will also be said to be posi¬ 
tive definite, nonnegative and so on. 

If a real quadratic form 9 (x, x) is strictly of constant signs, then 
it has no isotropic vectors. In the case of real bilinear and Hermi- 
tian-symmetric bilinear forms 9 (x, y) the corresponding quadratic 
forms will be real, and the converse is true for them. That is, we have 

Theorem 90 . 1 . Let a quadratic form 9 (x, x) be generated by a real 
bilinear or Hermitian-symmetric bilinear form 9 (x, y). If 9 (x, x) 
has no isotropic vectors , then it is strictly of constant signs. 

Proof. As already noted, the quadratic form^9 (x, x) is real. In 
both cases it assumes on collinear vector values of the same sign. 
Suppose 9 (x, x) is not strictly of constant signs. Then we can find 
linearly independent vectors u and v such that 9 ( u , u) > 0 and 
9 (v, v) < 0 . For any real number a 

9 (u + av, u 4- av) = 9 (u, u) + a (9 (u, v) + 9 (i>, u)) 

+ a 2 9 (i>, v). ( 90 . 8 ) 

The right-hand side of this is a second-degree polynomial in a. 
It has real coefficients, which follows from the reality’of 9 (x, x) 
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and Lemma 90 . 3 . Since <p (u, u) and 9 ( v , v) have opposite signs, 
polynomial ( 90 . 8 ) will have two real roots. Let a 0 be one of them. 
This means that 9 (u + oc. 0 v, u + a 0 v) = 0 . However, the vector 
u — a 0 v is nonzero by virtue of linear independence of u and v, so 
the vanishing on it of the quadratic form is impossible under the 
hypothesis of the theorem. This contradiction completes the proof. 

It is no chance that we restricted our discussion in Theorem 90.1 
to quadratic forms generated only by the real bilinear and the Her- 
mitian-symmetric bilinear form. No other bilinear form can lead 
to a real quadratic form. Actually it only remains to consider the 
bilinear form in the complex space. But it is impossible for such a bi¬ 
linear form to generate a real quadratic form not identically zero. 
If for some vector u the quadratic form takes on a nonzero real value 
9 (u. u), then 9 (a u, au) = a 2 q> (u, u) will be a complex number 
for any complex a with a nonzero real and a pure imaginary part. 
So 

For real quadratic forms to have no isotropic vectors it is necessary 
and sufficient that they should be strictly of constant signs. 

A complex bilinear form always generates a quadratic form with 
isotropic vectors provided it is defined on a vector space of dimen¬ 
sion greater than unity. Indeed, assuming that this is not the case 
we can always find linearly independent vectors u and v such that 
9 (u, u)gt 0 and 9 (v, v) 0 . But according to ( 90 . 8 ) the vector 
u - r av will be isotropic under a suitable choice of complex number a. 
A Hermitian bilinear complex form can generate a quadratic form 
having no isotropic vectors. It follows from our studies that 

For a quadratic form generated by a Hermitian bilinear form to have 
no isotropic vectors it is sufficient that the real (or imaginary) part of 
the quadratic form should be strictly of constant signs. 


Exercises 

1. Prove that given any bilinear form 9 (x, y), equa¬ 
tions 9 ( 0 , y) = 9 (*, 0 ) = 0 for any x, y 6 K n . 

2. Find the dimension and a basis of a vector space of bilinear forms. 

3. Prove that sets of symmetric and skew-symmetric bilinear forms con¬ 
stitute subspaces in the vector space of all bilinear forms. 

4. Prove that the space of all bilinear forms is a direct sum of subspaces of 
symmetric and skew-symmetric bilinear forms. 

5. Prove that the set of all quadratic forms constitutes a vector space. Find 
its dimension and a basis. 

6 . Are the following sets of quadratic forms linear subspaces: 

the quadratic forms of constant signs, 

the quadratic forms assuming real values, 

the quadratic forms having no isotropic vectors, 

the quadratic forms for which all vectors of a given set are isotropic? 

7. Prove that, given any quadratic form in a normed space, there is a num¬ 
ber a such that for every x 

1 9 (*. *) 1 < II x 11 *. 
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8 . Let f (x, x ) be a quadratic form strictly of constant signs and if (*. x) 
an arbitrary quadratic form. Prove that there is a number p such that for every x 

1 if (x, x) | < pq> (x, x). 

9. Prove tnat a quadratic form is not strictly of constant signs if and only 
if the set of isotropic vectors and the zero vector form a linear subspace. 

10. Consider Exercises 1 to 9 for Hermitian bilinear and quadratic forms. 
Do all the assertions remain valid? 

11. Suppose that in a complex space K n some subspace L consists only of 
isotropic vectors of a Hermitian bilinear form q> (x, y) and a zero vector. Prove 
that q> (u, w) = 0 for any vectors u, v 6 L. 

91. The matrices of bilinear 
and quadratic forms 

We investigate a bilinear form cp (a:, y) in a 
space K n . Choose in K„ two fixed bases, e,, e 2 , . . e„ and g,, g 2 , . . . 

. . .) and let 

n n 

y = 2 

»=1 >=1 

Then by property (90.1) we have 

« n n n 

<p(*. y) = <p (2 £ n}<i } ) = T <p(*!*?>)Sity- (9i.i) 

1=1 j=i «=i 1=1 

Denote as before by x e and y q n X 1 matrices made up of the coor¬ 
dinates of vectors x and y in the corresponding bases and by G eq an 
nX n matrix with elements = q> (e t , qj). Relation (91.1) im¬ 
plies that 

(a^i y) = XeG^yij. (91.2) 

Thus, given fixed bases in K„, the bilinear form can be represented 
in matrix form (91.2). 

The matrix G eq is called the matrix of the bilinear form and is 
uniquely defined given fixed bases. Assuming that for <p (x, y) there is 
another similar representation with some matrix F eq besides (91.2) 
and taking x = e ( and y = qj, we at once get ff?' = <p (e h qj), i.e. 

F eq— Geq- 

Note that the right-hand side of (91.2) defines some bilinear form 
whatever the matrix G eq . The validity of (90.1) is immediate from 
the corresponding properties of matrix operations. Thus, given fixed 
bases in K„ there is a 1-1 correspondence between bilinear forms and 
quadratic matrices. 

Changing the bases in K„ affects the matrix of the bilinear form, 
of course. Let P be the coordinate transformation matrix for a change 
from e lt e 2> . . ., e n to f u / 2 , ...,/„ and Q the coordinate trans¬ 
formation matrix for a change from g,, g 2 , . . ., g n to t 2 , t 2 , . . ., t n . 
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By (63.3) 

x e = Px f , y q = Qy, (91.3) 

and therefore it follows from (91.2) that 

<P ( x i y) = x eG eq y q ~ X /P G eq Qy(. 

But on the other hand 

q>(*. y)^ x 'iG, t y,. 

Consequently 

G f i = P'G eq Q. (91.4) 

Since P and Q are nonsingular, in accordance with the terminology 
introduced in Section 64 we shall call matrices Gu and G eq equiv¬ 
alent. As shown earlier, equivalent matrices of tne same size, and 
only such matrices, have the same rank. This means that the rank of 
the matrix of a bilinear form is independent of the choice of bases 
and is a characteristic of the form itself. We shall call it the rank 
of the bilinear form. A bilinear form is said to be nonsingular if so is 
its matrix. A characteristic of a bilinear form is also the difference 
between the dimension of the space K n and the rank of the form. We 
shall call it the nullity of the bilinear form. 

It follows from the results of Section 64 that all matrices of the 
same rank are equivalent to a diagonal matrix with elements 0 and 1. 
In terms of bilinear forms this means that for an arbitrary form of 
rankr we can always find bases f lt /n and *i» * 2 > • • •. *n r 

such that the form is of the simplest type. That is, if 

n n 

x =X x iUi y=X v <*i> 

i=J ;=1 

then 


q> ( x . y) = 2j [ 'W 

A separate choice of bases for each variable of the bilinear form is 
made fairly rarely. It is more usual to choose a common basis. Let 
e x , e 2 , . . ., e n be some basis of K n and 

n n 

* = 2 ?(«(. y = }2 Y] e j- 

t=i )=i 

In this case, as in (91.1), we obtain the following representation of a 
bilinear form: 
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or in matrix notation 

q>(*> y) = x' e G e y e . (91.5) 

Here G e is a matrix with elements gij' = <p (e„ ej). It is G e that 
will throughout be called the matrix of a bilinear form. If again P 
is a coordinate transformation matrix for a change from e lt e 2 , . . ., e n 
to f x , 1 2 . • • •. /m then according to (91.4) matrices G. and G t of the 
same bilinear form cp ( x , y) will be related by 

G , P'G,P. (91.6) 

The matrices G e and Gj related by (91.6), with P nonsingular, are 
called congruent. Congruent matrices are always equivalent. In gen¬ 
eral the converse is not true, of course. 

What was said about bilinear forms carries over with slight 
changes toHermitian bilinear forms. Every Hermitian form can be 
represented uniquely in matrix notation 

<P (^i y) = ZtGeqgqy 

with e lt e t , . . ., e n and q lt q t , . . ., q n fixed. Under a change to / lf 
/„ . . ., i„ and tj, f 2 , . . ., t„, instead of (91.4) we have 

G )t = P'G eq Q. 

If the independent variables of the Hermitian bilinear form are given 
in a single basis, then the matrix notation of the form is similar 
to (91.6). That is, 

cp(z, y) = x' e G t y e . (91.7) 

Under a change to a new basis the matrices of the form are related by 

G,= P'G'P 

and we shall say that they are Hermitian-congruent. 

Now we can establish a relation between the type of a bilinear form 
and the type of its matrix. If the form is symmetric, then for any 
basis e,, e 2 , . . e n 

g$ = q>(e„ e,) = <f(ej, e,) = g%>, 

i.e. G 9 = Ge and the matrix G t of the form <p (x, y) is symmetric. 
If, however, the form is skew-symmetric, then 

g$ = <p(e„ ej) = — q>(ej, e,)=—gtf, 

i.e. G e — — G’ e . In this case the matrix G e is also called skew-sym¬ 
metric. 

The converse is also true. If in some basis the matrix of a form is 
symmetric (skew-symmetric), then so is the bilinear form generating 
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it. Let G e = G' e , then 

<P (y, x) = y'cG e x e ■= ( y'eG e x e y -= x' e G' e y e = x' e G e y c = <p (x, y). 

If, however, G' e = —G e , then 

<p (y, x) = y e G e x e = {yeG s x e ) — x e G e y e = x e G e y e = cp(x, y). 

Similar assertions hold also for the relation of the Hermitian bi¬ 
linear form to its matrix. If the form is Hermitian-symmetric, then 

= <P(*i. e y ) = cp(e ; , e,) = g£>, 

i.e. G r = GJ and the matrix G e of the form cp (x, y) is Heimitian. 
If the form is skew-Iiermitian, then 

g $—<p (««. ■= — «i) = - «J;\ 

i.e. G a = —G?. In this case G e is said to be skew-Hermitian. 

The converse statements are also true. Let G e = G*■ Then for the 
generating Hermitian bilinear form we have 

qp (y, x) = y' e G e x, = (y' t G t x,)' = x‘ e G' e y, = x' t G*y e = x;G,y e = <p(x, y). 
For the case G e = —G? we find 

<p (y, x) = yeGpX,. = (y' e G e x e ) ' = x' e G’ e y e = x' e G*y„ = — x;G e y e = — <p (x, y). 

The matrix of a zero bilinear form consists only of zero elements, 
i.e. is a zero matrix. It is the only matrix that is simultaneously sym¬ 
metric and skew-symmetric, as is the zero form. 

We have already noted that there is a very close connection between 
symmetric bilinear and quadratic forms. It is especially obvious on 
the matrix level. For a bilinear form 9 (x, y) thf' matrix relation 
(91.5) holds. For the corresponding quadratic form we have 

q>](x, x) = x' e G e x e . (91.8) 

For a fixed basis e lt e 2 , . . e n , given any matrix G e , (91.8) defines 
some quadratic form. The matrix G e in (91.8) is now called not the 
matrix of a bilinear form but the matrix of a quadratic form. 

While for bilinear forms there is a 1-1 correspondence between the 
forms and their matrices given a fixed basis in K n , there is no longer 
such a correspondence now. Every quadratic form can be given by 
the entire set of its matrices. This set contains only one symmetric 
matrix and the difference between any two matrices of a given set is 
a skew-symmetric matrix. 

Thus any ordinary quadratic form can always be given by a sym¬ 
metric matrix. Changing to a different basis affects the matrices of the 
quadratic form according to (91.6). We therefore conclude again that 
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problems of investigating symmetric bilinear and quadratic forms 
are closely related. For Hermitian quadratic forms this is no longer 
the case, since there is a 1-1 correspondence between them and Her¬ 
mitian bilinear forms and there is a 1-1 correspondence between their 
matrices. 

As with bilinear forms, the rank of a quadratic form is the rank of 
its matrix in any basis. If the matrix of a quadratic form is nonsin¬ 
gular, then the quadratic form is also called nonsingular. 

Essentially the study of bilinear forms is the study of their ma¬ 
trices in different basesor equivalently the study of a class of congruent 
matrices. All our immediate studies therefore will be concerned with 
the investigation of classes of congruent and Hermitian-congruent 
matrices. 

A number of properties for such classes follow at once from the pre¬ 
ceding results. Thus a matrix congruent with a symmetric (skew-sym¬ 
metric) matrix will necessarily be symmetric (skew-symmetric). In 
particular, symmetric is a matrix congruent with a diagonal matrix. 
From this we conclude that a nonzero symmetric matrix is never 
congruent with a skew-symmetric matrix although it may be equiv¬ 
alent to that matrix, and that a nonzero skew-symmetric matrix 
can never be congruent with a diagonal matrix. A matrix Hermitian- 
congruent with a Hermitian (skew-Hermitian) matrix is necessarily 
Hermitian (skew-Hermitian). Of the diagonal matrices it is only the 
matrix with real (pure imaginary) elements that can be Hermitian 
(ske w-Herm i tia n). 

In accordance with decompositions (90.4) and (90.7) of bilinear and 
Hermitian bilinear forms we obtain decompositions of an arbitrary 
matrix as a sum of a symmetric and a skew-symmetric matrix as 
well as that of a Hermitian and a skew-Hermitian matrix. These 
decompositions can be written out in explicit form: 

A = \ (A + A') -fi- (A-A'), 

A = ±(A + A*) + ±(A-A*). 

If A is the matrix of a bilinear form, then the first terms of the right- 
hand sides are the matrices of the symmetric parts of the bilinear form 
and the second terms are the matrices of the skew-symmetric parts 
of the same form. 

We shall often carry over to matrices without comment the ter¬ 
minology introduced for bilinear and quadratic forms. For example, we 
shall call a matrix positive definite, meaning by this that it is a matrix 
of a positive definite form and so on. 

One of the major problems connected with the bilinear form is that 
of determining the simplest form its matrix can be reduced to by 
changing the basis and finding the appropriate basis. This problem 




204 


Bilinear and Quadratic Forms 


[Ch. 11 


will be called the problem of transforming a bilinear form or the 
problem of reducing it to the simplest form. 

In matrix interpretation the transformation problem can be stated 
as follows: 

Given a matrix A find a nonsingular matrix P such that the matrix 

C = P'AP (91.9) 

congruent with A has the simplest form. 

Essentially this results from factoring the matrix, since it follows 
from (91.9) that 

A =* ( P-'-VCP - 1 . 

For Hermitian bilinear forms, of course, instead of (91.9) we shall 
consider the transformation 

C =- P'AP. (91.10) 

Computationally it is important that the matrix P in (91.9) and 
(91.10) should not be very complicated. That is because in finding 
new coordinates of vectors in terms of the old coordinates according 
to (63.3) one has to solve a system of linear algebraic equations with 
a matrix P and it is necessary that that solution should be carried 
out sufficiently fast. In some cases it is more convenient to seek the 
matrix P~ 1 instead of P. 

Some other forms of notation for bilinear and quadratic forms may 
be used besides those considered above. Sometimes we shall give 
them in explicit form: 

® ^ a n x iVp F ~ ^ a ji x i x j- (91.11) 

This notation can be simplified. For example, let the space be real. 
Then so are both the bilinear form and the matrix A of coefficients a }l . 
We introduce a space R„ whoso elements are the column vectors 

*--= (z t , x 2 , ..., x n y. y=(y„ y 2 : •••, y n Y 

and suppose that the scalar product is introduced as a sum of pairwise 
products of coordinates. Now we can write: 

fl) = (Ax, y), F = (Ax, x). (91.12) 

For Hermitian bilinear forms written as 

n n _ n n _ 

^=2 2,^'^’ F =,2 £i a p x i x j 

we again have (91.12) if of course the scalar product is introduced as 
a sum of the products of the coordinates of the first vector by the 
complex conjugate coordinates of the second vector. 



92) 


Reduction to canonical form 


295 


Fxercbes 

1. Prove that the determinant of a Hermitian matrix 

is a real number. 

2. What kind of number is the determinant of a skew-Hermitian matrix? 

3. Prove that the rank of a s kew-sym metric matrix is an even number. 

4. Bilinear forms <p (x, y) and q> (y, x) are in general different. What can be 
said about their matrices? 

5. Prove that the rank of a sum of bilinear forms does not exceed the sum 
of the ranks of the summands. 

6 . Prove that every bilinear form of rank r can be represented as a sum of r 
bilinear forms of rank 1. 

7. Prove that every bilinear form <p (x, y) of rank 1 can be represented as 

? (*. y) = 9 (*. a) (*>. y) 

for some vectors a and b. Is this representation unique? 

92. Reduction to canonical form 

Before proceeding to the study of various areas 
of application of bilinear and quadratic forms we consider a general 
method of congruence and Hermitian congruence transformation of 
matrices to a simple form. 

Given a square n X n matrix A, find a nonsingular matrix P such 
that a matrix C = P'AP has a sufficiently simple form. Under 
Hermitian congruence transformation, it is the matrix C — P'AP 
that must have a simple form. We shall now describe a general trans¬ 
formation method suitable for all matrices A . Differences between 
the congruence and Hermitian congruence transformations are in¬ 
significant. To be definite, we shall therefore assume that it is the con¬ 
gruence transformation of the matrix that holds. 

The method consists in constructing a sequence of matrices /1 0 = 
= A, A x , AA„ where each subsequent matrix is congruent 
with the preceding matrix, i.e. 

Ah~i = Pn+iAhPh . i 

for some matrix Py +1 . Since the congruence relation is transitive, 
the last matrix, A„ will be congruent with the original matrix A- 
The principle of constructing a sequence of matrices A h relies on ob, 
taining in the matrix Ay +l more zero elements for every k than there 
are in A k . Moreover, each time we compute the matrix Py+i from Ay 
we shall require that not only should there appear new zero elements 
in j4* +1 but also that there should remain all zero elements obtained 
at all the preceding steps. 

The transformation of a matrix Ay into 4^+! will be called a basic 
step of the method. Every basic step may consist of several auxiliary 
steps. They will all be reduced to elementary operations: interchang¬ 
ing of matrix columns (rows), addition to one column (row) of] an- 




296 


Bilinear and Quadratic Forms 


[Ch. 11 


other column (row) multiplied by a number, multiplication of a col¬ 
umn (row) by a number. We describe the auxiliary steps in terms of 
transformations of a matrix A into a matrix C = P'AP congruent 
with it, dropping for simplicity the index k. 

A. In a martix A the element a n #= 0. There is a nonsingular ma¬ 
trix P such that for the elements of the first column of a matrix C = 
= P'AP 


Cji = 


a ti< 

0 , 


7 = 1. 
7 # 1 . 


(92.1) 


The matrix P differs from a unit matrix in having a different first 
row, with 

( 1, 7 = 1, 

Pi j — \ “)i . (92.2) 

l —— . 7=^1- 

All 

Multiplying A on the left by a matrix P' does not affect the first row 
of A and makes zero all off-diagonal elements of the first column of 
a matrix P'A. Multiplying P'A on the right by P does not affect the 
first column of P'A. 

Note one important fact. All matrix minors in the upper left-hand 
corner of a matrix will be called principal minors. Since the matrix P 
is right triangular and each of its diagonal elements is equal to unity, 
of all minors in the first r columns only the principal minor is non¬ 
zero, it is equal to unity. All principal minors will therefore coin¬ 
cide in the matrices A and C. Indeed, using the Binet-Cauchy for¬ 
mula we find 



We shall use this remark later on. 

B. In a matrix A the element a lt is 0, but some element a } j is other 
than 0, where / > 1. There is a nonsingular matrix P such that for 
a matrix C = P'AP the element c lx = a jj is other than 0. The ma¬ 
trix P differs from a unit matrix only in the four elements at the inter¬ 
sections of rows and columns with indices 1, j. In those positions P 

is of the form ( Jq] • Multiplying A on the right by P interchanges in A 
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the columns with indices 1, Multiplying the matrix AP on the left 
by P‘ interchanges in AP the rows with indices 1, /. 

C. In a matrix A all diagonal elements are zero, but there are in¬ 
dices;, /, where / < l, such that a t j J raj l =?*= 0. There is a nonsingular 
matrix P such that for a matrix C = P'AP the element Cjj = a t j + 
4- a jx is other than 0. The matrix P differs from a unit matrix in one 
element p l} = 1. Multiplying A on the right by P adds to the ;th 
column of A its 7th column. Multiplying the matrix AP on the left 
by P' adds to the ;'th row of AP its 7th row. 

D. A matrix A is nonzero skew-symmetric, the element a 12 is 0, 
but some element aji is other than 0, where ; < 7. There is a nonsin¬ 
gular matrix P such that in a skew-symmetric matrix C = P'AP 
the element c 12 = a fi is other than 0. The matrix P is represented as 
a product P = P 3 -P 2 . The matrices P 3 and P 2 differ from unit ma¬ 
trices in the four elements at the intersections of rows and columns 
with indices 1, j and 2, 7 respectively. In those positions P 3 and P t 

are of the form As already stated, multiplying on the right by 

these matrices interchanges the columns and multiplying on the left 
interchanges the rows. 

E. The matrix of the third-order principal minor of a matrix A is 
of the form 

( a,, a , 2 a 13 \ 

0 0 a 23 1, (92.3) 

0 a 32 0 / 

where the elements a n , a 23 and a 32 are nonzero. There is a nonsingu¬ 
lar matrix P such that the first three principal minors in C = P'AP 
are nonzero. The matrix P differs from a unit matrix in one element 
p 31 , which may be any number save 0, — a 12 ajj and —t z n af4- Multi¬ 
plying A on the right by P adds to the first column of A its third 
column multiplied by p 31 . Multiplying AP on the left by P' adds to 
the first row of AP its third row multiplied by p 31 . 

F. A matrix A is skew-symmetric, the element a 12 is other than 0. 
There is a nonsingular matrix P such that for the elements of the first 
two columns of a matrix C = P'AP 


f 012 > j — 2 , 
1 0 , j 2, 


c ]2 — 


a )2 , 

0 , 


7 = 1. 

]'¥= I- 


Since under a congruence transformation a skew-symmetric matrix 
goes over into a skew-symmetric matrix, similar relations will hold 
also for the elements of the first two rows of C. The matrix P can be 
represented as a product P = P 3 -P 2 . The matrix P 3 differs from a 
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unit matrix in having a different second row, with 




0 , / = 1 , 

i. i = 2, 

a 

-T-' ;> 2 . 

fl 12 


The matrix differs from a unit matrix only in having a different 
first row, with 


P 


(2) = 
u 


1 . j = 1. 

0 , / - 2 , 


a H 

a i2 


/> 2 . 


Multiplying A on the left by does not affect the first two rows and 
the second column of A and makes zero all the elements of the first 
column of P[A save the first two. Multiplying P[A on the left by Pi, 
does not affect the first two rows and the first column of P[A and 
makes zero all the elements of the second column of P'A save the 
first two. Multiplying P'A on the right by P does not affect the first 
two columns of P’A. 

G. Suppose a matrix A has for some partition into blocks the struc¬ 
ture 


A = 



(92.4) 


where A u and A 22 are square blocks. If P J2 is a nonsingular matrix 
whose size is that of A 2 j, then the matrix 


/ ^11 

A\ 2 p 22 ^ 

V 0 | 

P 22 / 


is congruent with A. Moreover, C = P’AP, where 



A direct check of all the assertions made in the descriptions of the 
auxiliary steps presents no particular difficulty, and it is left for 
the reader as an exercise to show their validity. 

The method as a whole is carried out as follows. At the first basic 
step the matrix A is reduced to the form (92.4), where A u is a non¬ 
singular 1 X 1 or 2 X 2 matrix. If a matrix A h , k ^ 1, is of the 
form (92.4), then at the next basic step the matrix in the lower right- 
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hand corner is also reduced to the form (92.4) and a general congru¬ 
ence transformation is carried out according to step G. The matrix 
A h+1 can again be represented in the form (92.4) but the block in the 
upper left-hand corner will be not only nonsingular for it but will 
also have a greater size than for the matrix A h . The process is repeat¬ 
ed until at some step in the matrix A, there appears in (92.4) a 
zero block in the lower right-hand corner or the size of the block in 
the upper left-hand corner is n X n. The resulting transformation 
matrix is a left-to-right product of the transformation matrices of 
all the steps. 

The form of A, depends on whether the matrix A is skew-symmet¬ 
ric or not. So does the composition of the basic steps and the aux¬ 
iliary steps. 

Whatever the structure of a basic step, its aim is to obtain the next 
portion of zeros in the matrix to be transformed. If the original ma¬ 
trix is not skew-symmetric, then zeros are always obtained using an 
auxiliary step A, and steps B to C are necessary only for it to be pre¬ 
pared. But if the original matrix is skew-symmetric, then zeros are 
obtained using step F, and it is step D that is preparatory. We de¬ 
scribe the basic step of the method also in terms of the transformation 
of a matrix A and we begin with a nonskew-symmetric matrix A. 

At the first basic step the matrix to be transformed is nonskew- 
symmetric. If the element a u # 0 and all off-diagonal elements of 
the first column are zero, then nothing changes and we assume that 
the basic step has been carried out. We take a unit matrix as a trans¬ 
formation matrix P. In general, however, we carry out the first of 
the auxiliary steps A to C that can be made. If this happens to be 
step B or C, then after it we must carry out step A or both steps, B 
and A. We lake as a transformation matrix P a left-to-right product 
of all transformation matrices of the actually made auxiliary steps. 
As a result of the first basic step, in the transformed matrix A t all 
the off-diagonal elements of the first column will be zero, i.e. A x 
will have a block structure of the form (92.4). 

The difference of all the other steps from the first is due to the fact 
that the matrix to be transformed may turn out to be skew-sym¬ 
metric. If it is not, then the basic step to be made next does not differ 
in anything from the first. If however, the matrix to be transformed 
is skew-symmetric, then under any congruence transformation it 
remains skew-symmetric and it is impossible to obtain a nonzero 
element in the upper left-hand corner using this matrix alone. A way 
out is based on transforming an extended lower diagonal block. 

Until a skew-symmetric matrix is found, the block in the upper 
left-hand corner of (92.4) for matrices A k will be a right triangular 
matrix with nonzero diagonal elements. If the elements in positions 
(1, 2) and (2, 1) of the skew-symmetric matrix are nonzero in the 
lower right-hand corner, then for the matrix A h which is the next to 
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be transformed we change representation (92.4) by decreasing by 
unity the size of the block in the upper left-hand corner. Now the 
3x3 matrix in the upper left-hand corner of the new lower diagonal 
block will have the form (92.3) and we can carry out an auxiliary 
step E. After that it is possible to carry out three times in succession 
step A. Indeed, as we have noted, making step A does not affect the 
principal minors of the matrix. Hence in the given case, after car¬ 
rying out step A, in the lower right-hand corner of the new matrix 
the first two principal minors will be nonzero. It is clearly possible 
therefore to make another step A. A similar reasoning shows that 
step A can be carried out a third time. Having made a step “back" 
we were enabled to move three steps “forward”. If necessary, step D 
is carried out before step E. 

Thus, if A is not a skew-symmetric matrix, then the above method 
allows us to construct a nonsingular matrix P such that the congruent 
matrix P'AP will have the following structure: 


P'AP 



(92.5) 


Here M is a right triangular matrix with nonzero diagonal elements 
and the size of M is equal to the rank of A. 

If A is a skew-symmetric matrix, then all the basic steps of the 
method, including the first, are carried out according to the same 
scheme. Suppose we have already obtained a matrix A h of the form 
(92.4) and there is a nonsingular block-diagonal matrix with skew- 
symmetric 2x2 blocks in the upper left-hand corner. Since under 
a congruence transformation a skew-symmetric matrix goes over into 
a skew-symmetric matrix, the block A )2 in (92.4) is zero. We first 
have to obtain nonzero elements in positions (1, 2) and (2, 1) of 
the skew-symmetric matrix in the lower right-hand corner. It is 
possible that to do this should require an auxiliary step D. We further 
carry out step F, which adds to the diagonal another nonsingular skew- 
symmetric 2X2 block, and proceed to the next basic step. Now too 
the process is continued until at some step in the matrix A s there 
appears in representation (92.4) a zero block in the lower right-hand 
corner or the size of the block in the upper left-hand corner is n X n. 

So if A is a skew-symmetric matrix, then in this case the method 
allows us to construct a nonsingular matrix P such that P'AP has 
the following structure: 



Here Af is a block-diagonal matrix with nonsingular skew-symmetric 
2x2 blocks. The size of M equals the rank of A. 
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For a Hermitian congruence transformation the general scheme 
of the method remains the same. The process, however, turns out 
to be even simpler than for the ordinary congruence transformation 
if the auxiliary step C is replaced by the following. 

C'. In a matrix A all diagonal elements are zero but there are in¬ 
dices j, Z, where / < Z, such that among the elements a (1 and aji 
there is at least one nonzero element. There is a nonsingular matrix P 
such that for a matrix C = P'AP one of the diagonal elements Cjj 
and c n is nonzero. That is, c t j = aa t} and c,, = i (ay, — a,y). 
The matrix P differs from a unit matrix in two elements, p,y = 1 
and p tl = t. Multiplying A on the right by P adds to the ;th column 
of A its Zth column and to the Zth column of A its ;th column multi¬ 
plied by —i. Multiplying AP on the left by P' adds to the /th row 
of AP its Zth row and to the Zth row of AP its /th row multiplied by i. 

Now there is no need for steps D to F of the general method, as we 
shall never go beyond step C'. Moreover, formulas (92.2) remain un¬ 
changed. 

Thus, if A is a nonzero matrix, then the method allows us to con¬ 
struct a nonsingular matrix P such that a matrix P'AP Hermitian- 
congruent with A will have the following structure: 



(92.7) 


Here M is a right triangular matrix with nonzero diagonal elements. 
The size of M equals the rank of A. 

The forms of matrices (92.5) to (92.7) are called canonical forms 
for the operations of congruence transformation. A canonical basis 
is also any basis in which the original matrix has such a form. Matrices 
of the forms (92.5) and (92.7) are by themselves called right trapezoi¬ 
dal matrices. Similarly defined are left trapezoidal matrices. 

Note a number of interesting conclusions arising from the canoni¬ 
cal forms of matrices. As we have already said, a congruence trans¬ 
formation preserves the symmetry and skew-symmetry of matrices. 
If one of these properties is a feature of the original matrix, it must 
be inherited by the canonical form. In addition to what has been said 
it can be concluded therefore that 

A symmetric matrix is congruent with a diagonal matrix. 

A Hermitian matrix is Hermitian-congruent with a real diagonal 
matrix. 

A skew-Hermitian matrix is Hermitian-congruent with a pure imag¬ 
inary diagonal matrix. 

In all these cases reduction to canonical form is effected particu¬ 
larly simply, since there cannot arise a need to carry out even one 
of the auxiliary steps D to F. 
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For matrices of the canonical forms (92.5) and (92.6) we may per¬ 
form yet another congruence transformation with a diagonal matrix 
and have the nonzero elements determining the nonsingularity of the 
block M equal either +1 or —1. Such a canonical form of a matrix 
and the corresponding basis are called normal. It is clear that mul¬ 
tiplying on the right (left) by a diagonal matrix results in multi¬ 
plying the columns (rows) by the diagonal elements of the transfor¬ 
mation matrix. We again describe the transformation in terms of 
the auxiliary step with a matrix A. 

H. A real nonskew-symmetric matrix A of rank r has the canonical 
form (92.5). There is a real diagonal matrix P such that the nonzero 
diagonal elements c j} of a matrix C = P'AP are equal to sgn a } j, 
with 

Pl, -\ 1. ;>r. 


A real (complex) skew-symmetric matrix A of rank r has the canon¬ 
ical form (92.6). There is a real (complex) diagonal matrix P such 
that the nonzero upper off-diagonal elements of the matrix C = P'AP 
equal +1 and the nonzero lower off-diagonal elements equal —1, 
with 


Pjj 



1 , 



; is odd, 

/ is even. 


A complex nonskew-symmetric matrix A of rank r has the canon¬ 
ical form (92.5). There is a complex diagonal matrix P such that 
the nonzero diagonal elements Cjj of C — P’AP equal 1, with 


Pj) = 


ajA'\ ;<r, 
1. }>r. 


A Hermitian congruence transformation with a diagonal matrix is 
rarely employed, since it can change only the absolute values of 
the elements determining the nonsingularity of the block M in (92.7) 
but cannot make the complex diagonal elements real. 


Exercises 


1. Prove that if reduction to canonical form using 
a matrix P is carried out according to the above method, then det P = ±1. 
2. What does the matrix equation 


G-Dcnc J)-( 


1 J- i 0 \ 

2 ill 


(92.8) 


mean in terms of the canonical form? 
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3. To what form can a nonskew-symmetric matrix be reduced using congru¬ 
ence transformation if the auxiliary step E is excluded? 

4. What form would a transformation matrix P have if each basic step ef 
the above method consisted only of the auxiliary step A? 

5. Prove that any right triangular matrix is congruent with a left triangular 
matrix. What is the simplest form of the transformation matrix? 

6. Prove that any nonsingular matrix of odd size is congruent with a nonsin¬ 
gular right triangular matrix. 

7. Let G be tne matrix of a positive definite bilinear form. Prove that given 
its elements g t j, for all i and ; 

8. Let G be the matrix of a negative definite bilinear form. Prove that given 
its elements g t j, for all t and / 

Stt < 0. (St) + Sjt) 1 < 4g tt g])■ 

9. Prove that the matrices of all symmetric positive (negative) definite 
bilinear forms are congruent. 

10. Prove that for a matrix G to be the matrix of an alternating bilinear form 
it is sufficient that there should be diagonal elements with opposite signs in it. 


93. Congruence and matrix decompositions 

The general method of congruence transforma¬ 
tion of a matrix to canonical form does not always make it possible 
to predict the form the coordinate transformation matrix for a change 
to canonical basis will have. Under some additional constraints on 
the original matrix, however, this question can be given a quite 
definite answer. 

Suppose that in a matrix A all principal minors, except perhaps 
the highest-order minor, i.e. the determinant of A, are nonzero. We 
show that it is always possible to represent such a matrix as a product 

A = LDU, (93.1) 


where L is a left triangular matrix with unit diagonal elements, D 
is a diagonal matrix and U is a right triangular matrix with unit 
diagonal elements, i.e. 



Equating the elements of A and the products of LDU we get 


min (i, i) 

2 hpdpp u pj- 

p=i 


a h = 


(93.2) 
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Now we find from (93.2) successively all the unknown elements of 
the matrices of decomposition (93.1). Namely, 

dn =" 


_ a u , a ) i 

u ”— 5T» 7’ 

i-i 

dit — a n~ S lipdppU p i, 

• p=i 

i-i 

— X * ip d PP u Pi 

^- ' 

j-1 

a ji — 2 d pp u pi 

7 f" 1 


7 > 1 . 

i>l. 


, <>!,/>(. 


(93.3) 


We apply to (93.1) the Binet-Cauchy formula. Recall that among 
the minors of the left triangular matrix L in the first r rows only the 
principal minor is nonzero, it is equal to unity. A similar assertion 
holds for the matrix U, if the rows are replaced by columns, of course. 
Therefore 


/I 2 ... r\_ /I 2 ... r \ 

\1 2 ... r) \&| ••• ^r/ 

/A, k 2 ... k r \ (\ 2 ... r\ 

(l 2 ...rh DU {l2... r)- d “ d » 


X DU i 


• dr 


Hence 


J - J _ \l £. ... 1/ 

fl 'l _a “’ . /I 2 ... I-1\ 

14 ll 2 ... i-i) 


(93.4) 


Under the assumption the principal minors of A are nonzero. There¬ 
fore so are all the diagonal elements d tt in (93.4) except perhaps the 
last one. 

We shall fairly often deal with decompositions (93.1) for symmet¬ 
ric and Hermitian matrices. If again all the principal minors of A, 
except perhaps the last one, are nonzero, then a symmetric matrix 
can always be represented as a product 

A = S'DS (93.5) 

and a Hermitian matrix can be represented as a product 

A = S'DS. 


(93.6) 
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Here S is a right triangular matrix with unit diagonal elements 
and D is a diagonal matrix, i.e. 



1 $|2 • 

• • $ln 


d ii 

0 

s= 

1 . 

• • $2n 

, D = 

d 2 2 



0 

1 


0 

^nriy 


Completely in accordance with (93.3) we now have 


^ii— a n> s i J ~ ' 7 > 

i -1 

dn = dn 2 dppSp {, i 1, 

p=i 
i-i 

a, i~ 2 d pp s pt s pj 

*u= - 

for decomposition (93.5) and 


(93.7) 


^il = a n« 7 '- > 

i -1 

dtt = &u 2 dpp [ Spi |-, i > 1 , 
p=i 

i -1 

an— J dppsptspj 

S U = , 7>»i 

for decomposition (93.6). Formulas (93.4) remain valid. 

Decompositions (93.1), (93.5) and (93.6) are extensively used to 
solve diverse problems of linear algebra. As to congruence transforma¬ 
tions of the matrix these decompositions lead to the following rela¬ 
tions: 

(£■*')' AL~ l - DUL'(L~ l )' AL~^ = DUir~ l \ 
S~ l 'AS~ l = D, S~ l 'AS T ^D. 


The matrices DUL~ 1 ' and DUL~ V are right triangular and the 
matrices/? are diagonal; the zero element on their principal diagonals 
may be only the last. Therefore we have again obtained the already 
familiar canonical forms of matrices under a congruence transforma¬ 
tion. Now we can say, however, that coordinate transformation 
matrices for a change to canonical basis are right triangular, since 
so are the matrices L~ v and S~ l . The above decompositions them¬ 
selves give coordinate transformation matrices L' and S for a change 
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from the canonical basis to the original basis, which are also right 
triangular. 

In the case of a symmetric matrix the described process of decom¬ 
position is closely related to the so-called Jacobi algorithm for 
transforming a quadratic form to canonical form. The only difference 
is that in the Jacobi algorithm we find the matrix S’ 1 instead of S. 
Notice that S is much easier to find than S _1 . 

Congruence transformations with a right triangular matrix are 
among the simplest, but still sufficiently general transformations to 
be applied to a wide class of matrices. Of certain interest therefore is 
a description of the class of matrices that can be reduced to canonical 
form using transformations with a right triangular matrix. 

Lemma 93.1. If a rectangular matrix A is representable in the 
block form 

(B Q\ 

r)’ < 93 ' 8) 

where B is a square nonsingular r X r matrix, then the rank of A is r if 
and only if 

T = RB-'Q. (93.9) 

Proof. We multiply A on the left by a nonsingular block matrix 



where the corresponding blocks have the same size as in (93.8). Then 


VA = 


f B s 

tO ■ T — 


Q .). 

RB~ l Q/ 


The matrices A and VA have the same rank but it is equal to r if 
and only if T — RB~ X Q = 0. 

Now we can describe the desired class of matrices. It turns out to 
be closely related to matrices of the form (93.8) and (93.9). 

Theorem 93.1. For a nonskew-symmetric matrix A to be reducible 
to canonical form using a congruence transformation with a right trian¬ 
gular matrix it is necessary and sufficient that the number of the first 
nonzero principal minors of A should equal its rank. 

Proof. Necessity. Let a nonskew-symmetric matrix A be reducible 
to canonical form (92.5) using a right triangular matrix P. It is 
clear that the number of the first nonzero principal minors in A 
cannot be greater than the size of the block M. Applying the Binet- 
Cauchy formula and considering that there is no nonzero minor in 
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the first columns of P, except the principal minor, we get 


A 




12 ... s\ 

1 2 ... J 


for every s not greater than the size of M. Since the principal minors 
of M and P are nonzero, the number of the first nonzero principal 
minors of A equals its rank. 

Sufficiency. Suppose the number of the first nonzero principal 
minors of A and its rank equal r. We represent A in the block 
form (93.8), where the size of the block B is equal to r X r. Since 
all principal minors of the matrix B are nonzero, it follows from the 
foregoing that it can be represented as B = LDLJ similarly to (93.1). 
We construct a block matrix 


P = 



A direct check shows that 

/ DUL~ ] ■ L~ l ( — BB~ l R' + Q)\ 

P'AP= ( -.—-. 

V o 0 J 

The matrix DUL~ V is nonsingular right triangular and P is nonsin¬ 
gular right triangular, and hence A is reducible in the required way 
to canonical form. 

For a congruence transformation of a skew-symmetric matrix and 
a Hermitian congruence transformation of an arbitrary matrix the 
corresponding statements can be proved in a similar way and we 
shall restrict ourselves to their formulation. 

Theorem 93.2. For a skew-symmetric matrix A of rank r to be reduc¬ 
ible to canonical form using a congruence transformation with a right 
triangular matrix it is necessary and sufficient that the number of the 
first nonzero principal minors of even order of A should equal r/2. 

Theorem 93.3. For a matrix A to be reducible to canonical form 
using a Hermitian congruence transformation with a right triangular 
matrix it is necessary and sufficient that the number of the first nonze¬ 
ro principal minors of A should equal its rank. 

Congruence and Hermitian congruence transformations of a ma¬ 
trix are not in general similarity transformations. However, if for 
some class of matrix P one of the following groups of relations holds 

PP' = p'p = E, PP* = P*P = E, (93.10) 

then the congruence transformation becomes a similarity transforma¬ 
tion and in order to make studies we may use the earlier obtained 
results relating to the similarity of matrices. As we already know, 
the first group of relations in (93.10) is satisfied by real orthogonal 
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matrices, and the second group of relations is satisfied by complex 
unitary matrices. Recalling the results of Sections 76 to 81 for or¬ 
thogonal and unitary similarities we therefore conclude that the fol¬ 
lowing statements are true: 

Any real symmetric or skew-symmetric matrix can be reduced to canon¬ 
ical form using a congruence transformation with an orthogonal matrix. 

Any complex matrix can be reduced to canonical form using a Hermi- 
tian congruence transformation with a unitary matrix. 

These statements are mainly of theoretical interest, since in prac¬ 
tice orthogonal and unitary transformation matrices are difficult to 
find, especially for n^5. 


Exercises 


1. Prove that if decompositions (93.1), (93.5) and 
(93.6) exist, then they arc unique. 

2. Prove that if all the minors of a matrix A in the lower right-hand corner 
(except perhaps the minor of the highest order! are nonzero, then there is a 
unique decomposition A = LDU, where Lis a right triangular matrix, U is a left 
triangular matrix, each with unit diagonal elements, anaL is a diagonal matrix. 

3. Prove that for the elements d lt of the matrix D of Exercise 2 


dnn — a nn< d ii — 


A ('* l + \ . n ) 

\l. l — I. n) 

/ * T 1» t + 2. n \ 

U- 1, » t2. n) 


i < n. 


4. Into what triangular factors can a matrix be factored if its minors in the 
lower left-hand (upper right-hand) corner are nonzero? 

5. Suppose for the elements a t j of a matrix A 

a i) ~ 0, k<j-l, j — i < l, (93.11) 

given some numbers l < k. Such a matrix is called a band matrix. Prove that 
if for a band matrix A decomposition (93.1) holds, then 

i/y = 0, ] — i<l, u„ = 0, j — l>k. 

6. A matrix A is said to be tridiagonal if it satisfies conditions (93.11) for 
k = 1 and l = — 1. What form have formulas (93.3) and (93.7) for a tridiagonal 
matrix? 

7. A matrix A is said to be right (left) almost triangular if it satisfies condi¬ 
tions (93.11) for fc = n and l = —1 (fc = 1 and l = —n). What form have 
formulas (93.3) for almost triangular matrices? 

8. What number of arithmetical operations is required for the various forms 
of matrices to obtain decompositions of the type (93.1)? 

9. How are decompositions (93.1), (93.5) and (93.6) to be applied to solve 
systems of linear algebraic equations? 


94. Symmetric bilinear forms 

Discussing bilinear and quadratic forms we 
have often paid particular attention both to symmetric bilinear forms 
and to bilinear forms generating real quadratic forms. Only two 
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kinds of bilinear forms simultaneously satisfy both conditions, these 
are the real symmetric and the Hermitian-symmetric bilinear form. 
The matrices of these forms are in any basis a real symmetric or 
a Hermitian matrix respectively. A congruence transformation 
reduces both forms of matrices to a diagonal real normal form. 

As we have seen, various congruence transformations can reduce 
the same matrix to canonical form. In general therefore the canonical 
form is not uniquely defined. The question naturally arises: What 
do the different canonical forms to which the same matrix is reduc¬ 
ible have in common? We know that the rank of a matrix does not 
depend on transformation. Whatever the method of reducing to canon¬ 
ical form is therefore, the number of the last zero rows will be the 
same. Much more can be said concerning the real symmetric and the 
Hermitian matrix. The canonical form of these matrices can be 
characterized by the number of positive and negative terms it con¬ 
tains. There is an important 

Theorem 94.1 (the law of inertia for quadratic forms). The number 
of positive terms and that of negative terms in the canonical form of 
a real symmetric matrix under the ordinary congruence transformation 
and of a Hermitian matrix under a Hermitian congruence transforma¬ 
tion do not depend on the method of reduction. 

Proof. Let some matrix A satisfy the hypotheses of the theorem. 
Consider a quadratic form F with a matrix A of rank r in variables 
x lt x 2 , . . ., x„ and suppose that two methods have been used to 
reduce it to the normal form 


F = yl-yl^r- 


yl-yln-yl+t- ■■■ 


= z\ + z\ + ... + z1-zli-‘ut-■••-zl (94.1) 


Since the change from the variables x,, x 2 , . . ., x„ to y lt y 2 , . . ., y n 
was effected using a nonsingular linear transformation, the new 
variables will be linearly expressible in terms of the old variables, 
with the determinant of the inverse transformation matrix nonze¬ 
ro. So 


Similarly 


n 

yi ~ 2 ^Ia x s' 

s= I 


z j — 2 c u x i * 
r=i 



(94.2) 


(94.3) 


Suppose that k <. I and write the system of equations 

y, = y , = ... = y h = z l+1 = ...=*„= 0. (94.4) 
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If the left-hand sides of these equations are replaced by their expres¬ 
sions in (94.2) and (94.3), a system n — l -f k of homogeneous lin¬ 
ear equations in n unknowns x lt x 2 , . . x n is obtained. The number 
of equations in that system is smaller than that of unknowns, so the 
system has a nonzero real solution a„ a 2 , . . ., a„. 

Now replace in (94.1) all the variables by their expressions in 
(94.2) and (94.3) and then substitute the numbers a lt a 2 , . . ., a„ 
for x x , x 2 , . . ., x„. If for brevity we denote by y l (a) and zj (a) the 
values of the variables y t and zj after such a substitution, then taking 
into account (94.4) relation (94.1) becomes 

- yLi (a) — ... — Ur (a) ■ z\ (a) — ... — zj (a). 

It follows that 

2 i ( a ) = • • • = z i ( a ) — 0- (94.5) 

On the other hand, according to the choice of the numbers a lt 
o 2 , . . ., a„ we have 

2 |+1 (a) = ...= z r (a) = ...= z„ (a) = 0. (94.6) 

Thus the system of n homogeneous linear equations 
z, = 0, t = 1.2,..., n, 

in n unknowns x lt x 2 , . . ., x„ has, by (94.5) and (94.6), a nonzero 
solution a,, a 2 , . . ., a„, i.e. the determinant of the system must be 
zero. This contradicts (94.3). We arrive at a similar contradiction 
assuming l < k. Ilcnce l — k and the theorem is proved. 

Any real ordinary (Hermitian) quadratic form in a real (complex) 
vector space has a unique real symmetric (complex Hermitian) 
matrix in any basis. These matrices satisfy the hypotheses of Theo¬ 
rem 94.1. Whatever the basis, the number of positive and negative 
terms in the canonical form of a matrix is invariant for a quadratic 
form and is called its positive and negative index of inertia respec¬ 
tively. The difference between its positive and negative indices is 
called the signature of the quadratic form. We can now formulate 
some useful corollaries of Theorem 94.1. 

Corollary. A quadratic form is positive ( negative) definite if and only 
if the positive ( negative) index of inertia is equal to n. 

Corollary. A quadratic form is of constant signs if and only if one 
of the indices of inertia is zero. 

The law of inertia permits some classification of real quadratic 
forms to be given. We shall say that two quadratic forms are affinely 
equivalent if for each of them we can choose a basis such that the 
matrices of those quadratic forms are the same. In this case we shall 
also say that a nonsingular transformation converts one quadratic 
form into the other. It is easy to verify that the affine equivalence of 
quadratic forms is an equivalence relation and that two quadratic 
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forms are equivalent if and only if their matrices are congruent in 
the same basis. It follows from the law of inertia therefore that all 
real quadratic forms in a vector space K„ can be grouped into non¬ 
overlapping classes, each consisting only of affinely equivalent quad¬ 
ratic forms. A class is characterized by rank and signature. 
This grouping is called an affine classification of real quadratic forms. 

Given any rank r of quadratic forms a given classification always 
has two “extreme” classes, the classes with signatures +r and —r. 
The first class comprises all nonnegative quadratic forms of rank r, 
the second comprises all nonpositive quadratic forms of rank r. 
Both classes taken together contain all rank-r quadratic forms of 
constant signs and only these forms. 

In general the constancy of signs of a quadratic form is easy to 
establish by reducing the form to canonical form in one of the ways 
described above. Of considerable interest in some cases, however, are 
also direct criteria of the constancy of signs. Taking into account 
the great significance of these very quadratic forms we shall carry 
out additional studies for them, confining our discussion mainly to 
quadratic forms in a real space. We shall again assume that the 
matrix of a quadratic form is real symmetric. For the case of a com¬ 
plex space the results of the studies will be the same and the proofs 
differ in minor details. 

Theorem 94.2 (Sylvester’s criterion). For a quadratic form to be 
positive definite it is necessary and sufficient that all principal minors 
of the matrix of that form should be positive. 

Proof. Necessity. Let a quadratic form with a matrix A be positive 
definite. Then there is a nonsingular transformation with a matrix P 
reducing the form to a sum of squares. According to (91.9) this means 
that E = P'AP or A = (/ 5 ' 1 )'P _1 . Using the Binet-Cauchy formula 
we find 



Since P is nonsingular, the first s columns contain at least one nonze¬ 
ro minor. Hence for every s the right-hand side of the equation ob¬ 
tained is positive. 

Sufficiency. Suppose now that all principal minors of the matrix A 
of some quadratic form are positive. We reduce that form to canoni- 
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cal form using the transformation defined by formulas (93.7). Under 
the hypotheses of the theorem and according to formulas (93.4) 
all the coefficients of the canonical form will be positive, i.e. the 
quadratic form is positive definite. 

Corollary. For a quadratic form to be negative definite it is necessary 
and sufficient that all principal minors of odd order should be negative 
and all principal minors of even order should be positive. 

The proof follows from Sylvester’s criterion and from the fact 
that if A is the matrix of a negative definite quadratic form, then —A 
is the matrix of a positive definite quadratic form. 

Theorem 94.3 (Jacobi’s criterion). For a quadratic form to be positive 
definite it is necessary and sufficient that all coefficients of the character¬ 
istic polynomial of the matrix of the form should be nonzero and have 
alternating signs. 

Proof. Necessity. As already noted, a transformation of variables 
with an orthogonal matrix can reduce a given quadratic form to 
canonical form, where coefficients are the eigenvalues X lt X 2 , . . ., X„ 
of the matrix of the form. Under the hypotheses of the theorem the 
eigenvalues must be positive. The characteristic polynomial / (X ) 
equals 

/ (h) = (X — A.,) (X — X 2 ) ... (X — X„) = -f a n-i^ n_1 + • • • + + a<> 

and all of its coefficients are nonzero and have alternating signs, 
which is immediate from Vieta’s formulas for the coefficients a t . 

Sufficiency. Let the coefficients of the characteristic polynomial 
be nonzero and have alternating signs. The roots of this polynomial 
will be real as the eigenvalues of a symmetric matrix and it remains 
to show that they are positive. Suppose this statement has been 
proved for all polynomials of degree n —1. Since all coefficients /' (X) 
are nonzero and have alternating signs, under the assumption /' (X) 
has the (n — l)th positive root. It is known from mathematical 
analysis that if a polynomial has only real roots, then they are 
separated by the roots of a derivative. Therefore / (X) has at least 
n — 1 positive roots. The last root will also be positive since the 
product of roots is positive. 

Criteria for nonnegative and nonpositive quadratic forms are 
much more complicated, and this is mainly due to the fact that in 
these cases the matrices of the forms are singular. One of the main 
ways of investigating the constancy of signs of a quadratic form 
involves reducing its matrix to a symmetric form (93.8) and (93.9) 
and studying that form. Matrices of constant signs being closely re¬ 
lated, we restrict our consideration to nonnegative matrices only. 

We shall say that a matrix H is a permutation matrix if in each of 
its rows and in each of its columns there is only one nonzero element 
and all the nonzero elements equal unity. It is clear that multiply¬ 
ing an arbitrary matrix A on the right by a permutation matrix H 
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interchanges the columns in A and multiplying on the left inter¬ 
changes the rows. 

Lemma 94.1. For an arbitrary nonsingular matrix A there is a per¬ 
mutation matrix H such that in the matrix AH all principal minors- 
are nonzero. 

Proof. The matrix A is nonsingular. Hence there is at least on& 
nonzero element in its first row. Interchanging an appropriate columa 
with the first column makes the first-order principal minor nonzero. 
Suppose that by interchanging the columns we have made all prin¬ 
cipal minors up to the /cth order nonzero. If interchanging the 
last ra — k columns cannot yield a nonzero principal minor of order 
k + 1, this means that in the first Ar -I- 1 rows of A there is no 
nonzero minor of order k -4- 1, i.e. that A must be singular. This 
contradiction proves the lemma. 

Theorem 94.4. For a quadratic form of rank r with a matrix A to 
be nonnegative it is necessary and sufficient that there should be a permu¬ 
tation matrix H such that in the matrix H'AH the first r principal mi¬ 
nors are positive. 

Proof. Necessity. Let a quadratic form of rank r with a matrix A 
be nonnegative. Then there is a nonsingular matrix P such that 
A = (P- l )’E r P- 1 , where E r is a diagonal matrix whose first r ele¬ 
ments equal unity and whose other elements are zero. According to 
Lemma 94.1 there is a permutation matrix H such that all principal 
minors of the matrix P~ l H are nonzero. 

Using the Binet-Cauchy formula we find for l^s^r 



2 kC* 

lsS*i<*,<-. .<h s <r 




Sufficiency. Suppose that for a quadratic form of rank r with 
a matrix A there is a permutation matrix H such that in the matrix 
H'AH the first r principal minors are positive. By Theorem 93.1 
H'AH can be reduced to canonical form using a transformation with 
a triangular matrix. According to (93.4) the nonzero coefficients of 
the canonical form of H'AH and hence of A are positive, i.e. the 
quadratic form is nonnegative. 
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As to quadratic forms that are not of constant signs, there are no 
theorems closely similar to Theorems 94.1 and 94.4 for them. There 
is only 

Theorem 94.5. II a quadratic form has a symmetric matrix A of the 
form (93.8) and (93.9), then its indices of inertia coincide with those of 
the “truncated’ quadratic form defined by the matrix B of (93.8). 

Proof. By Theorem 93.1 the matrix A can be reduced to canonical 
form using a transformation with a right triangular matrix, with 
relations (93.4) holding for the nonzero coefficients of the canonical 
form. But the matrix B of the “truncated” quadratic form also sat¬ 
isfies the hypotheses of Theorem 93.1 and for the coefficients of its 
canonical form we again have relations (93.4). Therefore the indices 
of inertia of the quadratic forms defined by A and B coincide. 

The particular interest we have shown for quadratic forms of con¬ 
stant signs is accounted for by their vast area of application. One 
of the major applications is the introduction of a metric in a vector 
space. Any bilinear form polar to some positive definite quadratic 
form may be regarded as a scalar product and hence we can turn a vec¬ 
tor space into a Euclidean or a unitary space, using it. The validity 
of the axioms for these spaces is obvious. Of no less importance for 
introducing a metric, especially a metric on subspaces, are also non¬ 
negative forms. As an example of using the constancy of signs we 
prove the validity of 

Theorem 94.6. For a nonsingular Hermitian bilinear form to be 
reducible to diagonal form it is sufficient that its symmetric (or skew- 
symmetric) part should be strictly of constant signs. 

Proof. Consider the case of a positive definite symmetric part. Let A 
be the matrix of the bilinear form. Then a matrix (1/2) (A -f A*) 
will be the matrix of the symmetric part. Since the symmetric part 
is positive definite, the matrix (1/2) (A -f- A*) is Hermitian-congrn- 
ent with a unit matrix. Hence there is a nonsingular matrix S such 
that 

yS' (A + A*)S = E. (94.7) 

We show that the matrix S'AS is normal. From (94.7) we have 

55' = 2 (A + A*)' 1 . 

Therefore 

{S'AS) (S'AS)*-(S'AS)* (S'AS) = S' (AS S'A*- A*SS'A) 

= 2 S' (A (A + A*)~' A* - A* (A 4 A *)“» A) S 
= 2 S' ((A* -1 (A + A*) A -1 ) -1 - (A"' (A + A*) A*"‘)-') S 

= 2 S' ((A* -1 + A-') -1 - (A* -1 + A* 1 ) -1 ) 5 = 0. 
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By virtue of normality S'AS is reducible to diagonal form using 
a Hermitian congruence transformation with a unitary matrix. 

Thus the matrix A of a Hermitian bilinear form is Hermitian- 
congruent with a diagonal matrix, which was to be shown. The 
proofs of the other cases are similar. 


Exercises 

1. Prove that if all principal minors of a real sym¬ 
metric or a complex Hermitian matrix are nonzero, then the number of its 
positive and negative eigenvalues coincides respectively with that of 
positive and negative terms of sequence (93.4). 

2. Prove that if a matrix is positive definite, then any diagonal minor is 
positive. 

3. Prove that a symmetric matrix of rank r always has at least one diagonal 
minor of order r not equal to zero. 

4. Prove that the maximum element of a positive definite matrix is on the 
principal diagonal. 

5. Prove that A is a positive definite matrix if for every i 

I a itl > Jj I a ijl- 

3*1 

6. Prove that for any symmetric matrix A of rank r there is a permutation 
matrix H such that among the first r principal minors of the matrix H'AH there 
are no two adjacent zero minors and the minor of order r is nonzero. 

7. Prove that the matrix H'AH of Exercise 6 can be represented as H'AH = 
= S'DS , where 5 is a right triangular matrix with unit diagonal elements and D 
is a block-diagonal matrix with 1 X 1 and 2X2 blocks. 

8. Prove that any nonnegative matrix of rank r can be represented as a sum 
of r nonnegative matrices of rank 1. 

9. Let A and B be positive definite matrices with elements ajy and b t j. 
Prove that a matrix C with elements c t j = aijb t j is also positive definite. 


95. Second-degree hypersurfaces 

Closely related to the study of real quadratic 
forms is another study, that concerned with second-degree hyper¬ 
surfaces. Wishing to stress the geometrical character of many of 
the properties of hypersurfaces, in what follows we shall nearly 
always call vectors points of a space R„. 

A second-degree hypersurface f in R„ is a set of points whose coordi¬ 
nates x lt x 2 , . . ., x„ satisfy the equation 

2 2 aj t x,x } — 2 £ b h x h + c = 0, (95.1) 

i=l;=l *=1 

where a jt , b k and c are real numbers. 

We simplify the notation. As in the case of quadratic forms, we 
shall assume that a matrix A with coefficients a f , is symmetric. By b 
we denote a vector with coordinates 6,, b 2 , . . ., b„. We introduce 
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in R„ a scalar product as a sum of pairwise products of coordinates. 
We can now regard a second-degree hypersurface / in R„ as a set of 
points x of a Euclidean space R„ satisfying the equation 

{Ax, x) — 2 (b, x) + c = 0 (95.2) 


or, A being symmetric, the equation 

(x. Ax) — 2 {b, x) -r c = 0. 


To begin with, we consider relative positions of second-degree 
hypersurfaces and straight lines. Take a straight line in R„. Let it 
pass through a point x 0 and have a direction vector 1. Points x of 
that straight line are defined by 

x = x 0 -T- It (95.3) 

for all possible real numbers t. Substituting the expression for x in 
(95.2) we get 

<2 (Al, l) - 2 1 {{b, l) - {Al, x 0 )) + {Ax 0 , x 0 ) - 2 (b, x 0 ) + c = 0. (95.4) 

Thus the points of intersection of the straight line (95.3) with 
hypersurface (95.2) are given by the roots of the quadratic equa¬ 
tion (95.4). 

We shall say that the straight line (95.3) with direction vector l 
has a nonasymptotic (asymptotic) direction relative to hypersur¬ 
face (95.2) if {Al, 1)^0 {{Al, l) = 0). 

Consider any straight line with a nonasymptotic direction l inter¬ 
secting a hypersurface. The points of intersection determine on 
every such line a segment, which by analogy with elementary geom¬ 
etry will be called a chord. We denote by A the set of the midpoints 
of all chords. If the ends of a chord collapse at a point, then that 
point will be ragarded also as the midpoint of the chord. We show 
that A is in some hyperplane. 

The ends of any chord are given by the values of a parameter t 
coinciding with the roots of (95.4). Therefore the midpoint of a chord 
is given by the value of t equal to a half-sum of the roots. According 
to Vieta’s formulas this yields 


t-. 


(6. l)-(Al, x B ) 


(Al, l) 

If z 0 is the midpoint of a chord, then 

(6. l)-(Al, x B ) 


(95.5) 


Now we have 


z o — ■ ' 




(Al, l) 


= {Al, x 0 ) + (b • l) {Al (A [; x " ] {Al, l) = ( b, l). 
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So the midpoints of all chords satisfy the equation 

(Al, x) = (b , l). (95.6) 

Since its right-hand side is independent of x 0 , according to (46.8) 
this equation defines a hyperplane whose normal vector is equal to A l. 

Hyperplane (95.6) is called a diametrical hyperplane conjugate to 
a direction l relative to hypersurface (95.2). 

The explicit form of the equation of a diametrical hyperplane 
allows us to establish a number of important properties of second- 
degree hypersurfaces. Let A be a nonsingular matrix. Then for any 
linearly independent vectors Zj, Z 2 , . . l„ so are vectors Al it Al 2 , • ■ ■ 
. . ., Al„. Suppose further that all directions Z lf Z 2 , • • •> Z„ are 
nonasymptotic. This will clearly hold, for example, when a quadrat¬ 
ic form (Ax, x) is positive definite. Hence it is possible to construct 
a system of n diametrical hyperplanes conjugate to Z lt Z 2 , . . ., l„. The 
hyperplanes will have in common a unique point x*. From (95.6) 
we now get 

(Ax* — b, l t ) = 0 

for i = 1, 2, . . u. By virtue of the linear independence of the 
vectors l, this means that Ax* — 6=0, i.e. that the point x * is 
nothing but a solution of the system of linear algebraic equations 

Ax = b. (95.7) 

The solution of the system with a nonsingular matrix is unique, so 
the constructed point x* is in fact independent of the choice of vectors 
Zj, Z 2 , . . •, Z„. 

Simple calculations show that for any point x* 

(Ax, x) — 2 (b, x) -j-c — (A (x — x*), x — £*) + 2 (Ax* — b, x — x*) 

4- (Ax*, x*) — 2 (b, x*) + c. (95.8) 

If, however, x* is a solution of system (95.7), then with respect to 
such a point hypersurface (95.2) has an important symmetry proper¬ 
ty. That is, for any x the left-hand side of (95.2) assumes the same 
values at the points 

X = X* + (X — X*), x' — x* (x X*). (95.9) 

It follows in particular that both points x and x' are or are not 
simultaneously on hypersurface (95.2). The equation 

i* = |(i + i') 

allows us to call the point x* the centre of symmetry of the hypersur¬ 
face. If on (95.2) there is at least one point of R„, then the centre of 
symmetry is said to be real. Otherwise it is called imaginary. 
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Now let x* be the centre of symmetry, i.e. for any x the left-hand 
side of (95.2) assumes the same values at the points x and x'. Hence 

(Ax, x) — 2 (b, x) — c = (Ax', x') — 2 (b, x ) -f c. 
According to (95.8) and (95.9) this is possible only if for any x 
(Ax* — b, x — x*) == 0. 

But the last identity holds if and only if Ax* — b = 0, i.e. if x* 
is a solution of system (95.7). Notice that here we have never assumed 
either the nonsingularity of the matrix A or the presence of any 
other of its properties besides symmetry. Therefore: 

For a system Ax = b to have a solution it is necessary and sufficient 
that hypersurface (95.2) should have a centre of symmetry. The set of all 
solutions coincides with that of all centres of symmetry. 

Thus there emerges a far-reaching connection between systems of 
linear algebraic equations and second-degree hypersurfaces. It is 
widely used in constructing a host of computational algorithms. 
Construction of a system of diametrical hyperplanes is central to 
a large group of methods among the so-called methods of conjugate 
directions. These will be discussed in the last chapter. 

In general investigation of second-degree hypersurfaces may be 
based on reducing them to canonical form in much the same way 
as quadratic forms are. But in addition to linear nonsingular trans¬ 
formations of variables, translations are required. 

Consider any transformation of variables x = Py reducing a quad¬ 
ratic form (Ax, x) to normal form. In variables y 1 , y t , . . ., y n the 
equation of a hypersurface will have the form 

y\+---+yk—yUi — •••— yl 

— 2 djj/,— ... —2d r y r — 2d r ^y r+l —... —2d n y n + c = 0. 

Now we translate the variables by the formulas 

{ y l — d l , 1 

k-\-\ ^t^r, 
y t , r + l<i<n. 

In these variables the equation is as follows: 

A + • • • + Zft — Zfc +1 — ... — 2 ? — 2d r+1 2 r+1 — ... — 2 d n z n + p = 0. 

Let one of the numbers d r+1 , . . ., d n , for example d n , be other than 0. 
Set 


v t 


2 j, i<.n, 

^r+l z r+l+ • -\-d„z n , i = n. 
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and then make another translation 

[ v it i < n, 

u i = ] p 

I v t —|, i = n. 

Now the equation of a hypersurface is as follows: 

± u\ ± u\ ± ... ± u’—2u n = 0, lO^n —1. (95.10) 

If there is no nonzero number among d r+1 , . . ., d n , p , then the 
equation of a hypersurface assumes the form 

± u\ ± u\ ± ... ± u* — 0, l<r^n. (95.11) 

And finally if d r+1 , . . ., d n are zero but p 0, then putting u t = 

= z ( /| p I 1 / 1 for every i we obtain one more form of the equation of 

a hypersurface. Namely, 

i uj i i . • • i Up i 1 = 0| 1 (95.12) 

Because of the law of inertia for quadratic forms, surfaces given 
by different equations of the form (95.10) to (95.12) cannot be con¬ 
verted into one another using linear transformation of variables and 
translation. As different one should regard equations that cannot be 
converted into one another by multiplying by (—1) and changing 
the indices of the coordinates. As in the case of quadratic forms, we 
have again obtained a subdivision of all second-degree hypersurfaces 
into nonoverlapping classes. 

Not infrequently reduction of second-degree hypersurfaces to 
canonical form uses only operations of translation and linear trans¬ 
formations of variables with orthogonal matrices. This is mainly 
due to both types of the transformations leaving unchanged the 
distances between points. In this case the canonical forms will be 
somewhat different, although on the whole they are obtained in the 
same way as those considered above. For example, in the case of R* 
a second-degree hypersurface can be reduced only to one of the fol¬ 
lowing forms: 

I. XjX*-f-X 2 i/ 2 aj = 0, 

II. X 2 y 2 -f-fc 0 a; = 0, (95.13) 

III. -]- <Zq = 0, 

in the case of R 3 it can be reduced to one of the following forms: 

I. V^^ + ^ + ao^O, 

II. X t x 2 -(- X 2 j/ 2 b 0 z = 0, 

III. XjX 2 -]- X 2 y 2 -]- a 0 = 0, 

IV. K l y i i-b 0 x = 0, 

V. k t x 2 -\-a 0 = 0. 


(95.14) 
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In all equations (93.13) and (95.14) the coefficients of the vari¬ 
ables written out are nonzero. The free term may be zero. According to 
the well-established terminology hypersurfaces in R 2 will be called 
second-degree curves and those in R a second-degree surfaces. Considering 
the interests of many branches of mathematics we shall study in more 
detail second-degree curves and surfaces using their canonical forms 
(95.13) and (95.14). 


Exercises 

1. Let A be a positive definite matrix. Prove that on 
the solution of the system Ax — b the expression (Ax, x) — 2 (b, x) reaches 
its minimum. 

2. Let A be a positive definite matrix. Prove that on the straight line (95.3) 
the expression (Ax, x) — 2 (b, x) reaches its minimum for the value of t in (95.5). 

3. Prove that for any direction to be nonasymptotic for hypersurface (95.2) 
it is necessary and sufficient that the quadratic form (Ax, x) should be either 
positive definite or negative definite. 

4. What symmetry property has a diametrical hyperplane conjugate to 
a direction l if l is an eigenvector of a matrix A corresponding to a nonzero 
eigenvalue? 

5. Prove that the system Ax = b has no solution if and only if hypersur¬ 
face (95.2) can be reduced to canonical form (95.10). 


96. Second-degree curves 


We shall study second-degree curves using 
equations (95.13). Let the equation of a straight line be of the form 

M*T*i¥* + a 0 -0. (96.1) 


1.1. The number a 0 is not zero', the numbers A., and are opposite 
in sign to a 0 . We write (96.1) as 

t- r A , 

1 




and set 




(96.2) 


Under the hypothesis a and b are real numbers, so (96.1) is equiva¬ 
lent to 



(96.3) 


The curve described by this equation is called an ellipse (Fig. 96.1) 
and the equation is called the canonical equation of an ellipse. We 
show some properties of the ellipse. An ellipse is a bounded curve. 
It follows from equation (96.3) that for all points of an ellipse we 
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the centre of the 
the major axis of 
The points of in- 


have | x a and | y b. An ellipse has two axes of symmetry, 
the x axis and the y axis, and a centre of symmetry, the origin. This 
follows from the fact that apart from a point with coordinates ( x , y) 
an ellipse contains points with coordinates ( x , — y), (—x, y) and 
(— x, — y). The axes of symmetry are called the principal axes 
of the ellipse and the centre of symmetry is 
ellipse. If a > b, then the x axis is called 
the ellipse and the y axis is the minor axis. 
tersection of the principal axes 
of the ellipse with the ellipse itself 
are called the vertices of the ellipse. 

When a — b, the ellipse is a circle 
of radius a with centre at the 
origin. Suppose for definiteness that 
a > b and let 

c* = a* — b*. (96.4) 

The points F 2 and F t with coordi¬ 
nates (—c, 0) and (-f-c, 0) are called 
the foci of the ellipse. 

Theorem 96.1. The sum of the distances from any point of an ellipse 
to its foci is a constant value equal to 2a. 

Proof. For any point M ( x , y) of an ellipse 



For the same point 

p (M, F 2 ) = ((x - c) 2 + y 2 ) 1 / 2 = (x 2 - 2xc -J- c 2 + b 2 


b”-x * \ 1/2 


I* / 


, 1/2 


— 2xc + a 2 j 

=(f-7 *+«)T—?*+*• 


The last equation holds, since —cx/a + a > 0 in view of the fact 
that | x a and c!a < 1. Further 


b'x'- \ i/2 


1/2 


P (M, F { ) = ((x + c) 2 + y 2 ) 1 / 2 = (X 2 + 2xc + c 2 + 6 2 - -^1) 

=((7*+«) , r-T*+«- 

Finally we have 

p {M, Fi) + p (.1/, F 2 ) = - x + a + ^ x -a = 2a. 
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1.2. The number a 0 is not zero’, the numbers A a and a 0 have the 
same sign. Let 

“-/t- < 96 - 5 > 

Then (96.1) is equivalent to 


a* ^ b* ~ 


(96.6) 


It is clear that there is no point in the plane satisfying (96.6). It 
is usual to say that (96.6) is the equation of an imaginary ellipse. 

1.3. The number a 0 is zero; the numbers A., and \ 2 have the same 
sign. Let 


Then (96.1) is equivalent to 



(96.7) 


It is clear that only the origin satisfies (96.7). It is usual to say 
that (96.7) is the equation of a degenerate ellipse. 

1.4. The number a 0 is not zero; the numbers Aj and are opposite 
in sign. Introducing new coefficients similar to (96.2) and (96.5) 

we reduce (96.1) to an equivalent 
equation (up to a designation of the 
variables) 

-£—£=!• < 96 - 8 > 

The curve described by this equation 
is called a hyperbola (Fig. 96.2) and 
the equation is called the canonical 
equation of a hyperbola. Unlike the 
ellipse, the hyperbola is an unbounded 
curve. As in the ellipse, the axes of sym¬ 
metry of the hyperbola are the coordi¬ 
nate axes and its centre of symmetry is the origin. The axes of sym¬ 
metry are the principal axes of the hyperbola and the centre of sym¬ 
metry is the centre of the hyperbola. One of the principal axes (the x 
axis) intersects with the hyperbola at two points, which are called 
the vertices of the hyperbola. It is called the transverse (or real) axis 
of the hyperbola. The other axis (the y axis) has no points in com¬ 
mon with the hyperbola and is therefore called the imaginary 
(or conjugate) axis of the hyperbola. Let 

c l = a 1 -f- b *. 



Fig. 96.2 
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The points F 1 and F 2 with coordinates (—c, 0) and (+c, 0) are called 
the foci of the hyperbola. 

Theorem 96.2. The absolute value of the difference of the distances 
from any point of a hyperbola to its foci is a constant value equal to 2a. 
Proof. For any point M (x, y) of a hyperbola 



For the same point 

P (M. F z ) - ((x - cY + yW* =(x*-2xc + c>- b* - j- ££) U12 

= (**(! £)-2xc + a*)‘ /2 =(igi- 2xc + a*)' /2 

a c \2 \ 1/2 I c I 

— x — fl ) ) =| 7 *-a|. 

Further 

p (M, F t ) = ((x + e)« + y*yi* = (x* + 2xc + c* - ft 2 + ^~) 

= (x 2 (l +~r) +2xc + fl2 ) l/2= (~jr +2xc + a*) 1/2 

= ((T* + a )THT x + a |- 

For all points of a hyperbola we have \ x a and c/a > 1. Therefore 


r c 

1 T*~ a 

for 

x > 0, 

P(M. F 2 )={ c 




for 

x<0, 

( i-x + a 

for 

x > 0, 

p(Jlf,/•,)-{ c 



I-x —a 

' a 

for 

x<0. 


Finally 

I P (M, ^i) - P {M , F 2 ) | = 2a. 

Consider the part of the hyperbola that is in the first quarter. For 
that part x^ a and (/]> 0, and equation (96.8) is equivalent to 



assuming of course b > 0 and a > 0. It is easy to see that this func¬ 
tion can be represented in the following form: 

b ba 

y = — x- . - . 

a x+\f x*— c* 


(96.9) 
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Along with function (96.9) consider the equation of a straight line 


y'= 



(96.10) 


Let M ( x , y) and M' ( x , y') denote a point of hyperbola (96.9) and 
that of the straight line (96.10) having the same abscissa x. With x 
increasing without limit, the difference 


y’-y 


_ ba _ 


remaining positive, is monotonically decreasing and vanishing. 
Hence M and M' converge but M always remains below M’. 

A similar property holds for the other parts of the hyperbola too, 
the role of the straight line (96.10) being played by one of the 
straight lines 

y = ±x, y=—jx. (96.11) 


These are called the asymptotes of the hyperbola. 

Notice that we said that (96.8) was the equation of a hyperbola. 
However, another equation also called the equation of a hyperbola 
is known from the school course. 

According to (96.11) we make a change of coordinates 



From (96.8) we have 



Hence in the new coordinate system (not rectangular, in general) the 
equation of a hyperbola has the form 

x'y' = 1 (96.12) 

or 



This is just the familiar school-book equation. Equation (96.12) is 
called the equation of a hyperbola in its asymptotes. 

1.5. The number a 0 is zero', the numbers A. x and are opposite in 
sign. After a standard change of coefficients we get 



(96.13) 
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equivalent to (96.1). From (96.13) we get 

(t-H (*++)-« 


or 


y=T x ' 


U= -X. 

a a 


(96.14) 


Thus (96.13) is the equation of a curve splitting into two intersecting 
straight lines (96.14). 

Consider now the second equation of (95.13). It has the 
form 

\ 2 y* + b 0 x = 0. (96.15) 

II.6. Both numbers A. 2 and b 0 are non¬ 
zero. Let 


2p=--g-#°. 

Now (96.15) is equivalent to the following 
equation: 

y s = 2 px. (96.16) 



The curve described by this equation is called a parabola (Fig. 96.3) 
and the equation is called the canonical equation of a parabola. It 
may be assumed without loss of generality that p > 0, since for 
p < 0 we obtain a curve symmetric with respect to the y axis. Like 
the hyperbola, the parabola is an unbounded curve. It has only one 
axis of symmetry, the x axis, and no centre of symmetry. The point 
of intersection of the axis of the parabola with the parabola itself 
is called the vertex of the parabola. The point F with coordinates 
(p/2, 0) is called the focus of the parabola. A straight line L given by 

*=--£■ (96.17) 

is called the directrix of the parabola. 

Theorem 96.3. The distance from any point of a parabola to its direc¬ 
trix is equal to that from the same point to the focus of the parabola. 

Proof. We have for any point M (x, y) of a parabola 


p(L, M) = x + -^, 

and further 


p(F.«)=((z-i) 2 + ! , ! )‘' J =((,-i) ! + 2px) ,,J 
= ( X’ — px + C- + 2pz ) 1,2 = (x< 4- px + ^ ) 

=((*+-m 


1/2 





since i>0 and p > 0. 
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Finally, consider the third equation of (95.13). It has a quite 
simple form: 

V 2 + a 0 = 0. (96.18) 

III.7. The number a 0 is not zero', the number Xj is opposite in sign toa 0 . 
Let 



Then the equation of a curve (96.18) is equivalent to 

x* - a 3 = 0 (96.19) 

or 

x = a, x = — a. (96.20) 

Hence the equation of a curve (96.19) is the equation of a curve split¬ 
ting into two parallel lines (96.20). 

III.8. The number a 0 is not zero', the number Xj coincides in sign 
with a 0 . Let 



Then (96.18) is equivalent to 

s* + a* = 0. (96.21) 


It is clear that there is no point in the plane whose coordinates 
satisfy that equation. It is usual to say that (96.21) is the equation of 
two imaginary straight lines. 

III.9. The number a 0 is zero. In this case (96.18) is equivalent to 

x 3 = 0. (96.22) 


By analogy with (96.19) it is usual to say that (96.22) defines two 
coincident straight lines, each defined by 

i = 0. 


Notice that for all points of an ellipse or a hyperbola we have the 
following equations: 


p(M, jy-f [a—£ 
p(M, F l ) = Ux + ^~ 


(96.23) 


Straight lines a t (i = 1, 2) defined by 

x--^-=0, x + ^-=0 (96.24) 


are called the directrices of the ellipse and of the hyperbola. We 
shall label the directrix and the focus with the same index if they 
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are in the same half-plane given by the y axis. We can now show that: 

The ratio of distances p ( M , F ,) and p (M, a t ) is constant for all 
points M of an ellipse , a hyperbola and a parabola. 

For the parabola this statement follows from Theorem 96.3. For 
the ellipse and the hyperbola it follows from (96.23) and (96.24). 
The ratio 

p (Af, Ft) 
p(M, a t ) 


is called the eccentricity. We have: 


for the ellipse, 



(^r 


<i 

>i 


for the hyperbola, 

e = 1 


for the parabola. 


Exercises 

1. What is a diametrical hyperplane conjugate to 
a given direction for second-degree curves? 

2. Write the equation of a tangent for the ellipse, the hyperbola and the 
parabola. 

3. Prove that a light ray issuing from one focus of an ellipse passes, after 
a mirror reflection from a tangent, through the other focus. 

4. Prove that a light ray issuing from the focus of a parabola passes, after 
a mirror reflection from a tangent, parallel to the axis of tne parabola. 

5. Prove that a light ray issuing from one focus of a hyperbola, after a mir¬ 
ror reflection from a tangent, appears to issue from the other focus. 


97. Second-degree surfaces 

We now proceed to study second-degree sur¬ 
faces given as equations (95.14). We first consider the equation 

Xji 2 X 2 y 2 + X 3 z 2 -(-a 0 = 0. (9^.1) 


1.1. The number a 0 is not zero] the numbers h 2 aru ^ are 
opposite in sign to a 0 . A standard change of coefficients yields 



(97.2) 


The surface described by this equation is called an ellipsoid 
(Fig- 97.1) and equation (97.2) is called the canonical equation of an 
ellipsoid. It follows from (97.2) that the coordinate planes are planes 
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of symmetry and that the origin is a centre of symmetry. The numbers a, 
b and c are the semiaxes of the ellipsoid. An ellipsoid is a bounded 
surface contained in a parallelepiped | x a, | y b, \ z c. 
The curve of intersection of the ellipsoid with any plane is an ellipse. 
Indeed, such a curve of intersection is a second-degree curve. By 
virtue of the boundedness of the ellipsoid that curve is bounded but 
the only bounded second-degree curve is the ellipse. 

1.2. The number a 0 is not zero', the numbers X lt X 2 , X 3 and a 0 are all 
of the same sign. A standard change of coefficients yields 



(97.3) 


There is no point in space whose coordinates satisfy this equation. 
Equation (97.3) is said to be the equation of an imaginary ellipsoid. 

1.3. The number a 0 is zero', the numbers X 1( and X 3 are all of the 
same sign. We have 



(97.4) 


This equation holds only for the origin. Equation (97.4) is said to be 
the equation of a degenerate ellipsoid. 

1.4. The number a 0 is not zero', the numbers and X t have the same 
sign opposite to that of X 3 and a 0 . A standard change of coefficients 

yields 

-£ + £—£=1- 07.5) 

The surface described by this 
equation is called a hyperboloid of 
one sheet (Fig. 97.2) and the equa¬ 
tion is the canonical equation of 
a hyperboloid of one sheet. It fol¬ 
lows from (97.5) that the coordinate 
planes are planes of symmetry and 
the origin is a centre of symmetry. 
Consider a curve L„ of intersec¬ 
tion of the one-sheeted hyperboloid 
with planes z = h. The equation of a projection of such a curve onto 
the x, y plane is obtained from (97.5) if we assume z = h in it. It 
is easy to see that that curve is an ellipse 



where 




= 1 , 


a* = a|/l + h 2 /c 2 and b* = b V 1 + h*/c*, 


its size increasing without limit for h-+ +oo. Sections of the one- 
sheeted hyperboloid by the y, z and x, z planes are hyperbolas. 
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Thus the hyperboloid of one sheet is a surface consisting of one 
sheet and resembling a tube extending without limit in the positive 
and negative directions of the z axis. 

1.5. The number a 0 is not zero', the numbers X lt X 2 and a 0 are opposite 
in sign to X 3 . Similarly to (97.5) we have 



(97.6)' 


* t 


The surface described by this equation is called a hyperboloid of 
two sheets (Fig. 97.3) and the equation is the canonical equation of 
a hyperboloid of two sheets. The coordinate 
planes are planes of symmetry and the origin 
is a centre of symmetry. Curves L h of inter¬ 
section of the two-sheeted hyperboloid with 
planes z = h are ellipses the equations of whose 
projections onto the x, y plane have the form 

-fL+_EL = l 

a *t i (,*t 


I 


where a* = a\ —1 + fr 4 /c 4 and b* = 
= b y —1+ h*/c*. It follows that the cutting 
plane z — h begins to intersect the two-sheeted 
hyperboloid only when | h \ > c. In the region 
between the planes z = —c and z = +c there 
are no points of the surface under consider¬ 
ation. By virtue of symmetry with respect to 
the x, y plane the surface consists of two 
sheets lying outside that region. The sections 
of the two-sheeted hyperboloid by the y, z and 
x, z planes are hyperbolas. 


to X a . We have 



— J- - — = 0 

a i -r b i c t —v. 


(97.7) 


The surface defined by this equation is called an elliptic cone 
(Fig. 97.4) and the equation is the canonical equation of an elliptic 
cone. The coordinate planes are planes of symmetry and the origin 
is a centre of symmetry. Curves L h of intersection of the elliptic cone 
with planes z = h are ellipses. If a point M (x 0 , y 0 , z 0 ) is on the 
surface of the cone, then (97.7) is satisfied by the coordinates of 
a point M, (tx 0 , ty 0 , tz 0 ) for any number t. Hence the entire straight 
line passing through M 0 and the origin is entirely on the given sur¬ 
face. 

We now proceed to consider the second equation of (95.14). We 
have 

X jX 2 -j- Xji/ 4 -j- bgZ = 0. 
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II.7. The numbers X, and X 2 a ^ e °f the same sign. It may be assumed 
without loss of generality that b 0 is opposite in sign, since if b 0 
-coincides in sign with ^ and \ 2 , we obtain a surface symmetric with 




respect to the x, y plane. A standard change of coefficients yields 

( 9 ^- 8 ) 

The surface described by this equation is called an elliptic parabo¬ 
loid (Fig. 97.5) and the equation is the canonical equation of an ellip¬ 
tic paraboloid. For this surface the x , z 
and y, z planes are planes of symmetry 
and there is no centre of symmetry. The 
elliptic paraboloid lies in the half-space 
z^O. Curves L h of intersection of the 
elliptic paraboloid with planes z = h, 
h > 0, are ellipses whose projections 
onto the x, y plane are defined by 

a *i i b*i 

where a* = a)/ h and b* = h. It fol¬ 
lows that as h increases the ellipses in¬ 
crease without limit, i.e. an elliptic paraboloid is an infinite cup. 

Sections of an elliptic paraboloid by the planes y = h and x = h 
are parabolas. For example, the plane x = h intersects the surface 
in the parabola 

,-hL—£- 

Z a — b 



lying in the plane x = h 
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II.8. The numbers ^ and X 2 are of opposite signs. A typical surface 
for this case is defined by the equation 



The surface described by this equation is called a hyperbolic 
paraboloid (Fig. 97.6) and the equation is the canonical equation of 
a hyperbolic paraboloid. The x, z and y, z planes are planes of sym¬ 


metry and there is no centre of sym¬ 
metry. Curves of intersection of 
the hyperbolic paraboloid with 
planes z = h are hyperbolas 



where a * = a]/ft and b * = bYh, 
for h > 0, and hyperbolas 



where a* = a]/ —h and b* = Fig. 97.6 

= bY—h, for h<0. The plane 2 = 0 

intersects the hyperbolic paraboloid in two straight lines 


y 



x. 


All surfaces defined by Equations III to V of (95.14) are indepen¬ 
dent of 2 . Therefore projections onto the x, y plane of curves of 
intersection of those surfaces with planes z = h are also independent 
of h. Such surfaces are called cylinders with added adjectives: ellip¬ 
tic, hyperbolic, etc. depending on the form of projection of the surface 
onto the x, y plane. 

Theorem 97.1. There are two distinct straight lines through each 
point of a one-sheeted hyperboloid and a hyperbolic paraboloid, lying 
entirely on those surfaces. 

Proof. Consider a hyperboloid of one sheet given by its canonical 
equation 



(97.10) 


For any a and {3 not both zero a pair of planes 

determines some straight line T. It is easy to verify that the given 
straight line T lies entirely on surface (97.10). Moreover, there is 
one straight line of family F through each point of that surface. 
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Indeed, look at (97.11) as a system of two equations 

«(t+tH(*-*)-* 

“(‘HHIt-tH 

in a and (3. The determinant of the system is zero if and only if 
a point M (x, y, z) is on hyperbola (97.10). The rank of the matrix of 
the system is obviously equal to unity. Hence a and p are determined 
up to proportionality. But this precisely means that there is a 
unique straight line T through each point of a hyperboloid. 

Similarly we can see that through each point of a hyperboloid 
there is a unique straight line T* determined by the planes 

’(t+t)-M *+*)• 

The straight lines F and T* are distinct. The same reasoning shows 
that the hyperbolic paraboloid 

— £l_ll 

Z ~ a* b‘ 

is covered by two distinct families of straight lines II and II* deter¬ 
mined by the planes 

«*-e=“(f-£) 

and 

Exercises 

1. What is a diametrical hyperplane conjugate to 
a given direction for second-degree surfaces? 

2. Write the equations of a tangential plane for the various second-degree 
surfaces. 

3. Investigate the optical properties of second-degree surfaces. 




CHAPTER 12 

Bilinear Metric Spaces 


98. The Gram matrix and determinant 

Let <p ( x , y) be some bilinear form introduced 
in a vector space K n over a number field P. The space K n is said to 
be bilinear metric if each pair of vectors x and y from K n is assigned 
& number ( x , y) from P called a scalar product, with 

(*, y) = <p (x, y). 

If a bilinear form in a complex space K n is Hermitian, then K n is 
said to be Hermitian bilinear metric. In these cases we shall also 
say that a bilinear metric is introduced in the vector space. 

Some similarity can be seen between bilinear metric spaces and 
Euclidean and unitary spaces considered earlier. It should be stressed 
from the outset, however, that there are significant differences. 
Comparing the definitions of a scalar product in Euclidean and uni¬ 
tary spaces with that of a bilinear form it is not hard to observe that 
in bilinear metric spaces a scalar preduct may in general not be 
symmetric and positive definite. 

The study of Euclidean and unitary spaces reduced to the investi¬ 
gation of additional properties of both the spaces and operators in 
them arising from bilinear forms defining scalar products. The same 
problem faces the study of bilinear metric spaces. The necessity of 
introducing a weaker definition of a scalar product results from the 
fact that it is by far not always that bilinear functions studied si¬ 
multaneously with vectors of a space and operators possess the 
symmetry property, to say nothing of that of positive definiteness. 

Many definitions and facts will be the same both for ordinary 
bilinear spaces and for Hermitian bilinear metric spaces. Where no 
confusion arises, the word “Hermitian” will therefore be dropped and 
the appropriate calculations will be performed only for bilinear 
spaces, tacitly assuming that for Hermitian spaces they are per¬ 
formed in a similar way. 

The main method of investigating any vector space is to expand 
a vector with respect to a given system of vectors and to study the 
expansion depending on various factors. In a general vector space 
there is no tool using which we could find an expansion but in 
a bilinear metric space the scalar product turns out to be such a tool. 



334 


Bilinear Metric Spaces 


[Ch. 12 


Take a system of vectors x lt x 2 , . . x m of a bilinear metric space 
K n and a vector x 6 K n . We see what comes from the presence of 
a scalar product for the study of the possibility of an expansion 

x = ajij + a 2 x 2 + . . . + a m x m (98.1) 

of a vector x with respect to the chosen system. Performing succes¬ 
sively scalar multiplication of equation (98.1) on the right by x lt 
x 2 , . . ., x m we obtain a system of linear algebraic equations 

(*n *l) + «2 (*2> * 1 ) + • • • + a m (^mi ) = (x, Xj), 

«i (*i, *t) + « 2 (x 2 , x 2 ) + . . . + a m (x m , x 2 ) = (x, x 2 ), 

. (98.2) 

®i (*n x m ) -j- <x 2 ( x 2 , x m ) . -r a m (x m , x m ) = (x, x m ) 

to determine the unknown coefficients a lt a 2 , . . ., a m of expansion 
(98.1). The matrix G, the transposed matrix of that system, has the 
form 

( (Xj, Xj) (£|, Xj) . . , (Xj, X m ) \ 

(x 2 , Xj) (x 2 , x 2 ) . . . (x 2 , X m ) I 

. I (98.3) 

(x m , X t ) (x m , X 2 ) . . . (x m , x m ) j 

and is called the Gram matrix of the system of vectors x lt x 2 , . . ., x m . 
Its determinant G (x v , x 2 , . . x m ) is called the Gram determinant 
or Gramian. Thus the problems of investigating expansions (98.1) 
and of solving systems (98.2) prove to be closely related. 

If vectors x 2 , x 2 , . . x m form a basis of a space, then for them the 
Gram matrix is the matrix of the basic bilinear form (x, y). The 
Gram matrices for different bases are congruent and hence have the 
same rank. The rank of a Gram matrix is an invariant of a bilinear 
metric space and is called the rank of the space. The difference 
between dimension and rank of a space is called the deficiency of the 
space. If the deficiency is different from zero, the bilinear metric 
space is said to be singular. The nonsingularity of the basic bilinear 
form implies the nonsingularity of Gram matrices for all bases. In 
this case the bilinear metric space is called nonsingular. For a non¬ 
singular space, system (98.2), where x 2 , . . ., x m is a basis, always has 
a unique solution, since the matrix of the system is nonsingular, 
and this enables us to investigate the coefficients of expansion (98.1) 
as a solution of system (98.2). 

Suppose for some vectors x and y of a bilinear metric space K n 
we have (x, y) = 0. In this case the vector y is called right 
orthogonal to x and x is called left orthogonal to y. In bilinear metric 
spaces we are forced to distinguish between left orthogonality and 
right orthogonality, since in the general case (x, y) (y, x). If, 
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however, ( x, y) = ( y, x) = 0, then such vectors will be called simply 
orthogonal. Taking into account the linearity of a scalar product in 
each independent variable, it is easy to verify that the right orthog¬ 
onality of y to vectors x 2 , ■ . ., x m implies its right orthogonality 
to any linear combination of them. The same can be said concerning^ 
the left orthogonality. In particular, it follow's that for a vector of K„ 
to be orthogonal to all vectors of a linear subspace L it is necessary 
and sufficient that it should be orthogonal to the vectors of some 
basis of L. 

Lemma 98.1. If the Gram matrix of a system of vectors x v x 2 , . . ., x m 
is singular, then there are vectors u and v, nontrivial linear combinations 
of vectors x lt x 2 , . . ., x m . such that u is right orthogonal and v is left 
orthogonal to all vectors of the span of x 1 , x 2 , . . ., x m . 

Proof. If the Gram matrix (98.3) is singular, then its rows are 
linearly dependent. Hence there are numbers Yi> y 2 , . . y m not all 
zero such that a linear combination of the rows is zero, i.e. 

7i (*1. *i) + 72 (*2. Xi) + • • . + 7m (*rn. *i) = 0, 

7i (*i. x 2 ) -f y 2 (x 2 , x 2 ) t 7 m (x m , x 2 ) = 0, 

. (98.4> 

7l (*1» X m ) - y 2 (X 2 , X m ) T • • • “b Ym ( x m< X m ) = 0. 

Letting 

n 

= 2 7 jXj, 
j=i 

relations (98.4) imply that ( v , x } ) = 0 for every ;. The vector v is 
a nontrivial linear combination of vectors x x , x 2 , . . ., x m and is left 
orthogonal to each of those vectors and therefore orthogonal to each 
vector of their span. The vector u is constructed in a similar way but 
proceeding from the linear dependence of the columns of the Gram 
matrix. 

Corollary. It the Gram matrix for a linearly independent system of 
vectors is singular, then the quadratic form (x, x) has an isotropic vector 
lying in the span of the given system and right (left) orthogonal to all 
vectors of the span. 

Indeed, by virtue of the linear independence of the vectors of the 
system the vectors u and i>are nonzero; moreover, ( u, u) = (v, v) = 0. 

In a number of important cases the Gramian is a convenient tool 
for establishing the fact of linear dependence or independence of 
a system of vectors. 

Lemma 98.2. For any linearly dependent system of vectors the 
Gramian is zero. 

Proof. Let a system x x , x 2 , . . ., x m be linearly dependent. Then 
the zero vector x can be represented as a nontrivial linear combina¬ 
tion of vectors x x , x 2 , . . ., x m . But in this case the homogeneous 
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system (98.2) must have a nonzero solution. Hence the determinant 
of the matrix of that system, i.e. the Gramian of the system x lt 
77i) is zero. 

Theorem 98.1. If a quadratic form ( x , x) has no isotropic vectors, 
then the Gramian is not zero if and only if its system of vectors is linear¬ 
ly independent. 

Proof. Necessity. Let the Gramian of a system of vectors x 1% x t , . . . 
. . ., x m be nonzero. Assuming that that system is linearly dependent, 
the Gramian must be zero by Lemma 98.2, which is impossible 
under the hypothesis. 

Sufficiency. Suppose the system of vectors is linearly indepen¬ 
dent. If the Gramian is zero, then according to the corollary of Lem¬ 
ma 98.1 there must be an isotropic vector. Since this is impossible 
under the hypothesis, the Gramian is not zero. 

Corollary. If a quadratic form (x, x) is strictly of constant signs, 
then the Gramian is zero if and only if the system of vectors is linearly 
dependent. 

Corollary. If a bilinear form ( x, y) is symmetric and a quadratic 
form ( x , x) is strictly of constant signs , then for any two vectors x and y 
we have the Cauchy-Buniakowski-Schwarz inequality 

I (x, y) I* < (x, x) ( y , y), (98.5) 

equality holding if and only if x and y are linearly dependent. 

Under the hypotheses of this statement the Gramian will be posi¬ 
tive for linearly independent vectors x and y, according to Sylvester’s 
criterion or its corollary, and zero for linearly dependent vectors, 
according to Lemma 98.2. In both cases inequality (98.5) holds. If, 
however, equality holds in (98.5), then x and y will be linearly de¬ 
pendent according to the preceding corollary, since their Gramian 
is zero. 

Consider the following simple but sufficiently important proper¬ 
ties of the Gramian. They not only lead to numerous consequences 
but also not infrequently allow a clear-cut geometrical interpreta¬ 
tion of them to be given. 

Property 1. The Gramian remains unaffected by an interchange of 
any two vectors in a system x lt x 2 , ■ ■ x m . 

Indeed, if any two vectors x t and xj are interchanged in x lt x 2 , . . . 

. . ., x m , then so are the ith and ;th columns and the ith and /th rows 
in the Gramian. The Gramian changes sign twice, i.e. as a result it 
will remain unchanged. 

Property 2. The Gramian remains unaffected by addition to any vec¬ 
tors of a system x lt x 2 , . . ., x m of any linear combination of the remain¬ 
ing vectors. 

Obviously it suffices to consider the case where the vector x 2 is 
changed, since the other cases reduce to that case in view of Proper¬ 
ty 1. Let a vector ajX 2 + • . . + a m x m be added to x 1 . Suppose the 
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bilinear form ( x , y) is ordinary. It is easy to verify that the new 
Gramian is obtained from the old Gramian by adding to the first 
row the second row multiplied by a 2 , etc., up to the last row multi¬ 
plied by a m and adding to the first column the second column multi¬ 
plied by a 2 , etc., up to the last column multiplied by a m . As is 
known, this leaves the Gramian unaffected. If the bilinear form ( x , y) 
is Hermitian, then the columns are multiplied by a 2 , . . ., a m . 

Property 3. If any vector of a system x l , x 2 , . . ., x m is multiplied by 
a number a, then the Gramian is multiplied by a 2 , if the bilinear form 
( x , y) is ordinary , and by | a | 2 , if (x, y) is Hermitian. 

Again it is sufficient to consider the case where x 2 is changed. 
But multipyingXj by a leads to the multiplication of the first row and 
the first column of the Gramian by a in the case of the ordinary bilin¬ 
ear form (x , y). If, however, ( x , y) is Hermitian, then the first row 
of the Gramian is multiplied by a and the first column is multiplied 
by a. It is from this that the property follows. 

Property 4. If each of the\vectors x l , x 2 . x m is left (right) orthog¬ 

onal to all the preceding vectors, then for the Gramian we have 

m 

G (x i, x 2 , ..., x m )= f| (X[, Xi). (98.6) 

<-i 

Indeed, the left (right) orthogonality of each vector of the system 
x x , x 2 , . . ., x m to all the preceding vectors results in the Gram ma¬ 
trix being right (left) triangular. But the determinant of a triangular 
matrix is equal to the product of its diagonal elements, whence (98.6). 

Especially interesting properties of the Gram matrix and the 
Gramian arise in the cases where the bilinear form (x, y) is real sym¬ 
metric or Hermitian-symmetric, and positive definite. Of course 
these cases imply simply that a bilinear metric space K n is in fact 
Euclidean or, respectively, unitary. 

In a Euclidean and a unitary space the Gram matrix for any basis 
system is the matrix of a positive definite quadratic form ( x , x). 
According to Sylvester’s criterion all principal minors of the Gram 
matrix are positive. Since any linearly independent system of vectors 
can be supplemented to a basis, it follows that we have 

Lemma 98.3. In a Euclidean and a unitary space the Gramian for 
any linearly independent system of vectors is positive. 

In a Euclidean space the Gramian has a very simple geometrical 
interpretation. This is stated by 

Theorem 98.2. In a Euclidean space the Gramian G (x lt x 2 , . . ., x m ) 
of a system of vectors Xj, x 2 , . . ., x m equals the square of the volume 
V 2 (xj, x 2 , . . ., x m ) of that system. 

Proof. Consider a real-valued function G 1 / 2 (x lt x 2 , . . ., x m ) of m 
independent vector variables x 2 , x 2 , . . ., x m . It satisfies Properties 
A and B of (36.3) according to Properties 2 and 3 of the Gramian. 



338 


Bilinear Metric Spaces 


[Ch. 12 


In a Euclidean space each vector of any orthonormal system of vectors 
is orthogonal to all the preceding vectors of the system. According 
to (98.6) therefore G(x lt x 2l . . ., x m ) satisfies Property C of (36.3) 
as well. But it now follows from Theorem 36.1 that G 1/2 (x lt x 2 , . . . 

. . ., x m ) coincides with the volume of the system of vectors. 

Corollary. For any system of vectors x 1 , x 2 , . . ., x m of a Euclidean 
space 

m 

0<G (ij, x 2 , .. ., £ m )^ [| (Xj, Xf) t 

i=i 

the equation at the left holding if and only if the system of vectors is 
linearly dependent and the equation at the right holding if and only if 
either the system of vectors is orthogonal or there is a zero vector in it. 

The validity of the statement follows from the first corollary of 
Theorem 98.1 and from the property of the volume of a system of 
vectors described by Hadamard’s inequality (36.1). 

Corollary. For any system of vectors x lt x 2 , .... x m of a Euclidean 
space 

G (x 2 , . . -i X[, X/ +1 , . . ., x m )^ G (x lt . . .. Xj)-G (ij+i, ■ . x m ) t 

equality holding if and only if either the sets of vectors x lt . . ., x, 
and z ;+1 , . . ., x m are orthogonal or one of them is a linearly dependent 
system. 

The proof is based on a simple analysis of formula (35.4). Recall 
only the following. If L x s L 2 , where L, and L 2 are any subspaces, 
then | ort^a: | ort*,, x \ for any vector x, equality holding only 

if x _i_ L 2 . 


Exercises 

1. Are the problems of finding expansions (98.1) and 
of solving system (98.2) equivalent? 

2. What yields the solution of system (98.2) if the vector x is not in the 
span of vectors x lt . . ., z m ? 

3. What does the Gram matrix (98.3) look like if: 

the vectors ij, . . ., x m are mutually orthogonal, 

each of the vectors x lt . . ., x m is left (right) orthogonal to all the preceding 
(subsequent) vectors, 

each of the vectors x t , . . ., x m is left (right) orthogonal to all the subse¬ 
quent (preceding) vectors, 

each of the vectors z (+1 , . . ., x m is left (right) orthogonal to each of the 
vectors x u . . ., zi? 

4. How does the Gram matrix change under elementary transformations of a 
system of vectors? 

5. Prove that if in an ordinary bilinear metric space ( x , y)= 0 always implies 
(y , x) = 0, then the scalar product is given by either a symmetric or a skew- 
symmetric bilinear form. 

6. Is the statement of Exercise 5 true for a Hermitian bilinear metric space? 
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7. Let G be the Gram matrix for some basis in a nonsingular Hermitian 
bilinear metric space K n . Prove that for an operator U with matrix G-'G' in 
the same basis for each vector x £ K n 

(Ux, Ux ) = ( x , x). 


8. Prove that for any linear operator A in a Euclidean or a unitary space K n 
the ratio 


k(A) 


G (Ax .. Ax m ) 

G v*i, ■ • ■» x m ) 


is independent of the vectors x m and equals the product of the squares 

of the moduli of the eigenvalues of A. 

9. Prove that for any linearly independent system of vectors x lt . . x m of 
a Euclidean or a unitary space and any vector z 


G (J|. r m . z) G(x t .J m _|. z) 

G(X|, -. • i x ni ) G (x |, Xnt-i) 


99. Nonsingular subspaces 

Any linear subspace L of K„ can be regarded 
as a bilinear metric space relative to the same scalar product that 
is introduced in K n . In general the nonsingularity of K n does not 
imply that of L and vice versa. 

Theorem 99.1. For all subspaces of K„ to be nonsingular it is necessary 
and sufficient that the quadratic form (x, x) should have no isotropic 
vectors. 

Proof. Necessity. Let all subspaces in K„ be nonsingular. Then so 
are all one-dimensional subspaces. But the Gram matrices for non¬ 
zero vectors x coincide with the scalar product ( x , x) which must be 
nonzero by virtue of the nonsingularity of one-dimensional sub¬ 
spaces. 

Sufficiency. Suppose (x, x) 0 for every x =?£ 0. Consider any 
subspace L and a basis x p x 2 , . . x m in it. By Theorem 98.1 the 
Gramian for this system is nonzero, i.e. L is nonsingular. 

Corollary. For all subspaces in a bilinear metric space K n to be non¬ 
singular it is necessary and sufficient that so should all of its one-dimen¬ 
sional subspaces. 

Corollary. In any ordinary complex bilinear metric space there are 
singular one-dimensional subspaces. 

To prove this statement it suffices to recall that in an ordinary 
complex bilinear metric space any quadratic form has isotropic 
vectors. 

When a quadratic form has isotropic vectors, there are both sin¬ 
gular and nonsingular subspaces in the bilinear metric space. If 
a bilinear form (x, y) has a rank r, then it is clear that there can be 
no nonsingular subspaces of dimension greater than r. But nonsingu¬ 
lar subspaces of dimension r exist. Such, for example, is the subspace 
spanned by those vectors of the canonical basis for which the Gram 
matrix coincides with the matrix M of (92.5). 
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We shall say that a set of vectors F of a bilinear metric space K n is 
right , left orthogonal or simply orthogonal to a set of vectors G of K n 
if for every pair of vectors x and y, where iff and y £ G, we have 
a similar orthogonality relation. It is clear that the set of all the vec¬ 
tors of K n right (left) orthogonal to each vector of F is a subspace. 
It is called the right (left) orthogonal complement of F and designated 
F± ( L F). 

In a Euclidean and a unitary space, subspaces l K n and K £ coincide 
and consist only of a zero vector. In bilinear metric spaces they may 
be distinct and do not necessarily consist only of a zero vector. The 
subspaces L K n and are called respectively the left and right null 
subspaces in K n . 

Observe that for any set of vectors F there are always inclusions 
s F 1 - and L K„ s L F and that for any vectors for L K„ or Kfr the 
Gram matrices turn out to be zero. 

Theorem 99.2. The dimensions of the left and right null subspaces 
are the same and equal the nullity of a bilinear form ( x , y). 

Proof. Choose in K n some basis x lt x 2 , . . ., x m . Take a vector x 
of Kj; and represent it as an expansion with respect to the basis 
according to (98.1). The condition that x should be in Kfr is equiva¬ 
lent to the conditions of the right orthogonality of x to each vector of 
the basis. But these conditions lead to a solution of a homogeneous 
system of the type (98.2) for determining the coefficients of the expan¬ 
sion. It is known (see Section 48) that the set of solutions of that 
system is a subspace whose dimension is equal to the nullity of the 
Gram matrix or equivalently to the nullity of the bilinear form (x, y). 
The proof for the left null subspace is similar. 

Corollary. For K„ to be nonsingular it is necessary and sufficient 
that the right and left null subspaces should consist only of a zero vector. 

In Euclidean and unitary spaces any subspace is orthogonal to its 
orthogonal complement and determines decomposition of the entire 
space not only into a direct, but even into an orthogonal, sum of 
those subspaces. Similar facts do not always hold in bilinear metric 
spaces. 

Theorem 99.3. Let L be a subspace in K„. For a decomposition 

K„ = L^L l = L± l L (99.1) 

to exist it is necessary and sufficient that L shouldfbe nonsingular. 

Proof. Necessity. Suppose that decompositions (99.1) hold. We 
shall look at L as a bilinear metric space with the same scalar product 
as in K n . The intersection Z. f| Z/- 1 - is the right null subspace in L. 
Since sums (99.1) are direct, that subspace contains only a zero vec¬ 
tor. According to the corollary of Theorem 99.2 this means that L 
is nonsingular. 

Sufficiency. If L is nonsingular, then L d L 1 - will contain only 
a zero vector and it is necessary to show that any vector x £ K n can 
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be represented as x = u -|- v, where u 6 L and v 6 L x . Take some 
basis x x , x 2 , . . ., x„, in L. For the desired decomposition x = u + v 
to exist it is necessary and sufficient that there should be a vector u 
in L such that x — u is right orthogonal to the vectors x x , x 2 , . . . 
. . ., x n Again we obtain a system of linear algebraic equations 
with a Gram matrix to determine the coefficients of the expansion 
of u with respect to x lt x 2 , . . x m . That matrix is nonsingular and 
the system has a solution, i.e. the vector u exists. Of course all 
that has been said concerning L L carries over completely to L L. 

Corollary. If a nonsingular subspace L is of dimension m, then the 
dimension of L x and X L is n — m. 

To prove this it suffices to use equation (19.1) and recall that the 
dimension of L f| L x and L f] X L is zero. 

Corollary. If a nonsingular subspace L has a maximum dimension, 
then L 1 = K £ and X L = x K n . 

Indeed, let the rank of the bilinear form ( x , y) be r. As we have 
already noted, the subspace L will be of dimension r and L L and X L 
will be of dimension n — r. But Kf, and X K „ are also of dimen¬ 
sion n — r and, moreover, K^gL^and- 1 ^ £ Therefore 
K k = and L K n = ±L. 

As to decompositions of the type (99.1) into orthogonal sums, Theo¬ 
rem 99.3 yields 

Corollary. Let L be a nonsingular subspace of maximum dimension. 
Decompositions (99.1) will be orthogonal if and only if the left and right, 
null subspaces coincide. 

Indeed, if decompositions (99.1) are orthogonal, then L x is not 
only right orthogonal but also left orthogonal to L, i.e. L 1 s X L. 
Similarly we have L L s L-. Hence L- = l L. Since L is of maximum 
dimension, this means that Kfr = x K n . If, however, the null sub¬ 
spaces coincide, then it follows that L x = X L, i.e. that L x and L L 
are both right and left orthogonal to L, and that decompositions 
(98.5) are orthogonal. 

We can now give the answer to the question: What is the connec¬ 
tion between expansion (98.1) and the solution of system (98.2)? 
Let vectors x t , . ... x m form a basis of a nonsingular subspace L. 
By Theorem 99.3 there are direct decompositions (99.1). Therefore 
each vector x of a bilinear metric space K n can be uniquely represent¬ 
ed as x = u -p v, where u 6 L and v 6 X L. Recall that a vector u 
is the projection of a vector x onto L parallel to the subspace l L. 
If we solve the system of linear algebraic equations (98.2) and com¬ 
pose a vector 

u = ajZj -|- ajjXj + . . . + a m x m , (99.2) 

then it is that vector that will be the projection of x onto L parallel 
to X L. Indeed, u is in L and by (98.2) the difference x — u is left 
orthogonal to the vectors x u . . ., x m . Hence x — u is in X L. It is 
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clear that in order to obtain the projection of x onto L parallel to L x 
it is necessary to solve the following system: 

<*i fo. x \) + a, (x„ x 2 ) — . . . -f- a m (x lt x m ) = (x lf x), 

«1 ( x 2< X \) — a 2 (*2- X i) + • • • — “m ( x 2' X m) = (* 2 . x ). 

.(99.3) 

®1 fern ^l) T ®2 ( x m• x 2 1 T • • • (x m , X m ) = (x m , x) 

and then calculate the desired projection according to (99.2). In the 
case of a Hermitian bilinear metric space the coefficients aj in (99.2) 
are replaced by a). 


Exercises 

1. Describe all nonsingular subspaccs of maximum 

dimension. 

2. Prove that for any set L there are inclusions 

Lc^(LX), 

In what cases docs equality hold in these formulas? 

3. Prove that if L is a nonsingular subspnee of maximum dimension in a 
space K n , then 

x (L x )-( x L)~ = K n . 

4. Prove that if a scalar product is given by a symmetric or a skew-symmetric 
bilinear form, then for any set F we have F- = -F. 

5. What is the connection between expansion (98.1) and the solution of sys¬ 
tem (98.2) if the Gram matrix of a system x x . x 2 . x„ is singular? 

6. Can there be a basis made up of isotropic' vectors in a nonsingular space? 

7. What can be said about the scalar product if the projections onto a fixed 
subspace L parallel to -L and L - coincide for all vectors? 

8. What can be said about the scalar product if the projections of a fixed 
vector onto the entire subspace L parallel to -L and T,- coincide? 

9. Let L be a nonsingular subspace of a Hermitian bilinear metric space K n 
of rank r < n. Prove that the following statements are equivalent: 

the subspace X L is of dimension n < r, 

the subspace L x is of dimension n — r, 

the subspace L is of dimension r, 
the subspaces L L and coincide, 
the subspaces 1 L and x K n coincide, 

the subspace X L consists of isotropic vectors and a zero vector, 

the subspace L l consists of isotropic vectors and a zero vector, 

the scalar product on X L is zero, 
the scalar product on L x is zero. 

10. What form will the Gram matrices have for the bases made up of bases 
of a nonsingular subspace L and the subspace - L (L-)? 

100. Orthogonality in bases 

In bilinear metric spaces the bases are non¬ 
equivalent. There are such among them for which systems (98.2) 
can be solved and studied particularly simply. This happens, for 
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example, when much of a Gram matrix consists of zero elements. 
Depending on the form the Gram matrices have we shall consider 
different classes of bases in bilinear metric spaces. 

The simplest matrices are diagonal matrices. Diagonal Gram ma¬ 
trices occur if and only if the bases consist of mutually orthogonal 
vectors. Such bases will be called orthogonal. A system of vectors 
formed by an orthogonal basis in their span will be called an orthog¬ 
onal system. 

Orthogonal bases can be defined in different ways. Definition in 
terms of mutual orthogonality is not always convenient to check, 
especially when the vectors of a basis are constructed successively 
starting from the first. It is sometimes useful therefore to employ the 
following definition: 

A basis e„ e 2 , . . ., e n is said to be orthogonal if each of its vectors is 
orthogonal to all the preceding vectors. 

A Gram matrix for vectors that satisfy this definition is diagonal, 
so both definitions are equivalent. In the general case a basis may 
contain both nonisotropic and isotropic vectors. Vectors of an or¬ 
thogonal basis can always be interchanged so that nonisotropic vec¬ 
tors are the first and isotropic vectors are the last. The diagonal form 
of the Gram matrix is preserved of course. 

Not all bilinear metric or Hermitian bilinear metric spaces have 
orthogonal bases. If there is at least one orthogonal basis, then this 
means that the matrix of the bilinear form (x, y) in the given basis 
has a diagonal form. Hence the matrix of ( x , y) must be congruent 
with a diagonal matrix in any other basis. The converse is also true 
of course. Therefore 

For an orthogonal basis to exist in a bilinear metric or Hermitian 
bilinear metric space it is necessary and sufficient that the matrix of the 
bilinear form ( x , ;/) should be congruent with a diagonal matrix. The 
set of all orthogonal bases coincides up to an interchange of vectors 
with the set of canonical bases of (x, y). 

Relying on our earlier studies of bilinear forms we can now say 
that of the ordinary bilinear metric spaces only those spaces have 
orthogonal bases in which the basic bilinear form (x, y) is symmet¬ 
ric. As to Hermitian bilinear metric spaces, it is spaces with a Her¬ 
mitian and skew-Hermitian basic bilinear form (x, y), as well as 
those with a bilinear form (x, y) having the real or imaginary part of 
the quadratic form (x, x) of constant signs, that have orthogonal 
bases. 

Note from the outset one fundamental difference between ordinary 
and Hermitian bilinear metric spaces with orthogonal bases. In an 
ordinary bilinear metric space K n the existence of an orthogonal 
basis implies the symmetry of the scalar product (x. y) and this in 
turn ensures the existence of an orthogonal basis in any subspace 
of K n . In a Hermitian bilinear metric space, in general the existence 
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of an orthogonal basis in the space itself does not automatically imply 
the existence of an orthogonal basis in any of its subspaces. But 
if a scalar product is given by a Hermitian-symmetric or skew- 
Hermitian bilinear form, then the consequence is again valid. 

Consider any orthogonal basis e lt e 2 , . . ., e„ of a bilinear metric 
space K n . There are as many isotropic and as many nonisotropic 
vectors as are respectively the deficiency and rank of K„. Taking 
into account the law of inertia for quadratic forms we conclude that 
if the bilinear form (x, y) is real symmetric or Hermitian-symmetric, 
then there will be the same number of vectors with positive and 
negative values of (e h e { ) in any orthogonal basis. Each number is 
invariant for all orthogonal bases in K„. Accordingly we shall speak 
of a positive and a negative index, as well as signature, of spaces with 
a symmetric form (x, y). For bilinear metric spaces with a nonsym- 
metric form (x, y), we shall speak only of the rank and deficiency of 
the spaces. 

If K„ is a nonsingular space, then no orthogonal basis e l , e 2 . . e n 
has isotropic vectors. In this case, for any vector x 6 K n 


n 


X 



J=1 


(*. e i) . 

(«y. *j) ] ' 


Indeed, performing a right scalar multiplication of 


x = a l e l -f <x 2 e 2 + . . . + a„e n 


successively by the vectors e l , e 2 , . . e„ we get 


«/ 


(*■ e J) 
( e J> e J) 


( 100 . 1 ) 

( 100 . 2 ) 


for every j. The vectors of an orthogonal basis in a nonsingular space 
can be normed to yield an orthonormal basis. For an orthonormal 
basis e lt e 2 , . . ., e n there are relations | ( ej , ej) \ — 1 for every ;. 

In singular spaces, there must be isotropic vectors among the 
vectors of any basis. Therefore representation (100.1) no longer holds 
for expansion (100.2) of vectors of a space. Orthogonal bases prove 
to be sufficiently useful in these spaces too, however. As an illustra¬ 
tion of their use, considei 

Theorem 100.1. // there is an orthogonal basis in a space with a sca¬ 
lar product , then the right and left null subspaces coincide. 

Proof. Let there be an orthogonal basis e lt e 2 , . . ., in a space K„ 
of rank r. It is assumed that the vectors e lt . . ., e r are nonisotropic 
and e r+1 , . . ., e„ are isotropic. We take a vector x 6 K n and expand 
it according to (100.2). Using representation (100.2) and taking into 
account the orthogonality of the basis and the isotropy of the vectors 
e T+i , . . ., e n it is easy to establish that (x, ej) = ( e jy x) = 0 for 
r <C 7 ^ n. Hence e r+1 , . . ., e n are simultaneously in the right and 
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in the left null subspace. But e r+1 , . . ., e„ are linearly independent 
as vectors of a basis and equal in number the dimension of the 
null subspaces, so both null subspaces coincide. 

Corollary. If in a bilinear metric space the scalar product is given, 
by a symmetric or Hermitian-symmetric bilinear form , then its right 
and left null subspaces coincide. 

Corollary. In any orthogonal basis isotropic vectors. and only 
isotropic vectors , form the basis of the null subspace in common. 

Corollary. If in a space with a scalar product there is an orthogonal 
basis , then that space can be decomposed as an orthogonal sum of any 
nonsingular subspace of maximum dimension and a null subspace. 

The last corollary actually means that the study of any singular 
spaces with orthogonal bases reduces to studying separately non¬ 
singular subspaces with orthogonal bases and subspaces on which 
the scalar product is zero. 

To know an orthogonal basis in a space is not only to be able to- 
And an orthogonal basis in the nonsingular subspace of maximum 
dimension but also to obtain an explicit expansion of the orthogonal 
projection of any vector onto that subspace with respect to its orthog¬ 
onal basis. Indeed, letCj, e 2 , . . ., e„ be an orthogonal basis in K„ , 
let e lt . . ., e r be nonisotropic vectors and let e r+1 , . . ., e„ be- 
isotropic vectors. Denote by L the subspace spanned by the vectors- 
e x , . . ., e T . It is clear that it is nonsingular and of maximum dimen¬ 
sion, that L 1 = X L and in addition 


K n = L®L l . 


Any vector x in K„ can be represented in a unique way as a sum 
x = u 4- v, where u 6 L and v £ L x . Here u is called the left orthog¬ 
onal projection of x onto a subspace L and v is the left perpendicular 
to that subspace. We write for x expansion (100.2) with respect to 
«j, e 2 , . . ., e„. Formula (100.1) no longer holds. Observe, however, 
that the first r terms in (100.2) form a vector u and the last n — r 
terms form a vector v. Performing a right scalar multiplication of 
equation (100.2) successively by e x , . . ., e T we get 


r 


U 



;=i 


(ep t]) J ' 


The projection v of a vector x onto the null subspace is defined very 
simply 


v = x— 2 


j=i 


iflULe 

(ep e i) 


The only thing that cannot be done now is to find the expansion of v 
with respect to the vectors e r+1 , . . ., e n using a scalar product, 
although the expansion itself does exist. 
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As we have already said, not all bilinear metric and Hermitian 
bilinear metric spaces have orthogonal bases. This circumstance 
leads us to seek other classes of bases, more convenient from the 
point of view of the scalar product given in space. A solution is sug¬ 
gested by the canonical form of the matrix of a bilinear form. 

A basis e,, e. t , . . ., e n is said to be pseudoorthogonal if each of its 
vectors is left orthogonal to all the preceding vectors and each of 
its isotropic vectors is left orthogonal to all vectors of the basis. 
The system of vectors forming a pseudoorthogonal basis in their span 
will be called a pseudoorthogonal system. 

Observe that in this definition the left orthogonality of vectors 
to all the preceding vectors can be replaced by the right orthogonality 
of vectors to all the subsequent vectors. This gives the same condi¬ 
tions. 

The Gram matrix for the vectors of a pseudoorthogonal basis is 
right trapezoidal. If the vectors of the basis are interchanged so that 
the nonisotropic vectors are the first vectors and the isotropic vectors 
are the last vectors, then the Gram matrix not only remains right 
trapezoidal but will also have the canonical form (92.5). Our earlier 
studies on reducing the matrix of a bilinear form to canonical form 
give a complete answer to the question as to when there is a pseudo¬ 
orthogonal basis. Namely, 

There is a pseudoorthogonal basis in any Hermitian bilinear metric 
space , as well as in any ordinary bilinear metric space, except for spaces 
with a skew-symmetric bilinear form ( x , y). The set of all pseudoorthog¬ 
onal bases coincides up to an interchange of vectors with the set of 
canonical bases of ( x , y). 

Every orthogonal basis is pseudoorthogonal. An ordinary bilinear 
metric space cannot contain simultaneously an orthogonal basis and 
a pseudoorthogonal basis that is not orthogonal. That is because the 
existence of at least one orthogonal basis implies the symmetry of 
all Gram matrices. A right trapezoidal matrix may be symmetric 
only if it is diagonal. A Hermitian bilinear metric space can contain 
simultaneously both an orthogonal basis and a pseudoorthogonal 
basis that is not orthogonal. This means that a right trapezoidal 
complex matrix may be Hermitian congruent with a diagonal matrix, 
which is also supported by example (92.8). 

If K„ is a nonsingular space, then none of the pseudoorthogonal 
bases has isotropic vectors, since a right trapezoidal matrix can be 
nonsingular if and only if it is a right triangular matrix with nonzero 
diagonal elements. In a nonsingular space, for the coefficients aj 
of expansion (100.2) of a vector x with respect to the vectors of a 
pseudoorthogonal basis e,, e 2 , . . ., e n we obtain a system of linear 
algebraic equations with a left triangular matrix. Indeed, multiply¬ 
ing equation (100.2) successively on the right by e lt e 2 , . . ., e n 
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we find that 

«i (e„ <h) = (*, Cl). 

«i (Ci, e 2 ) + «2 (e*. c 2 ) = (x, e 2 ), 

. (100.3) 

®i (Ci, c n ) t ct 2 (c 2 , c„) ot M (c n , c ri ) (•£> £„)• 

From these we successively determine a,, a», . . a„. Of course, 

the vectors of a pseudoorthogonal basis in a nonsingular space can 
be normed to yield a pseudoorthonormal basis such that | ( ej , e t ) | = 
= 1 for every /. 

Observe that the process of solving system (100.3) gives much 
more than just a simple expansion of a vector x with respect to 

a pseudoorthogonal basis e lt e 2 . e n . Simultaneously, without 

any extra costs we can determine all the vectors 

= o-i e i + a 2 e 2 + . . . 4- a h e h . 

The vectors u h form a sequence of projections of the same vector x 
onto embedded subspaces 

Li £ L 2 s . . . s L„, 

where L h is the span of vectors e 2 . e h . If we look at u h 

as an “approximation” to the solution x, then the left orthogonality 
of the “error" v h = x — x h to L h in fact implies the left orthogonality 
of v h to u lt u 2 , . . ., u h . We shall return to all these questions some¬ 
what later. 

If a space K n is singular, then in general the existence of a pseudo¬ 
orthogonal basis does not guarantee the coincidence of the right 
and left null subspaces and hence one cannot expect the space to be 
decomposed as an orthogonal sum of some of its subspaces. But knowing 
a pseudoorthogonal basis makes possible an efficient construction of 
a decomposition of the space as a direct sum (99.1). 

Suppose that in a space K„ of rank r there is a pseudoorthogonal 
basis Cj, e 2 , . . ., e„. It will be assumed that the vectors e,, . . ., e r 
are nonisotropic and e r+1 , . . ., e„ are isotropic. In a pseudoorthogo¬ 
nal basis the isotropic vectors are left orthogonal to all vectors of 
the basis, so they are left orthogonal to all vectors of K„. But this 
means that the isotropic vectors of the pseudoorthogonal basis form 
a basis of the left null subspace l K„. Denote by L the span of the 
vectors . . ., e r . By the second corollary of Theorem 99.3 

K n = L + ±L = L+±K n , 

with bases known for both L and L K n . For L the basis Cj, . . ., e r 
is pseudoorthogonal. 

So the study of any singular spaces with a pseudoorthogonal basis 
reduces to a simultaneous study of nonsingular subspaces with 
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a pseudoorthogonal basis and snbspaces on which the scalar product 
is zero. 

Any vector of K n can be uniquely represented as a sum x = u 4- 
-i- v, where u 6 L, v 6 -K n . If for a vector x we write expansion 
(100.2), then we again obtain a system of the type (100.3), but now 
with a left trapezoidal matrix instead of a nonsingular left triangular 
matrix, to determine the coefficients ay. Nevertheless we can deter¬ 
mine from that system the first coefficients a,, . . ., a r and we have 

u = a,e, -r a 2 e 2 -f • • • + a r e r , 

i.e. the projection of x onto L is determined completely relying on 
the knowledge only of a pseudoorthogonal basis in L. Again v = 
= x — u and again we cannot find an expansion of the vector v 
with respect to the vectors e r+l , . . ., e„ using a scalar product. 

A pseudoorthogonal basis is a sufficiently general type of basis, 
since almost all spaces have such a basis. As we already know, there 
is no basis of this type only in ordinary bilinear metric spaces with 
a skew-symmetric form ( x , y). For these spaces the most convenient 
type of basis is obvious, it is certainly the canonical basis of Gram 
matrices. In general it is possible to introduce a type of basis cover¬ 
ing all the above types of bases and existing in any space with a sca¬ 
lar product. Few new factors follow, however, and we shall not dis¬ 
cuss it now. 

. Besides a single basis with certain orthogonality relations between 
its vectors we shall sometimes deal with pairs of similar bases. 

A basis /,, / 2 , . . ., /„ is said to be left (right) dual to a basis 

<? 2 . e„ if (/,, e,) = 0 ((e h / f ) = 0) for i =£ j, with (/,, e,) 

( (e,, /,)) equalling 1 or 0 for every i. 

A basis /,, / 2 , . . ., /„ is said to be left (right) pseudodual to a 
basis e t , e 2 , . . ., e n if (/,, e f ) = 0 (( e f , /,) = 0) for jf < t, with 
(/ f , e f ) = 1 ((«?,, ft) = 1), and for every /, with (/,, <?,■) = 0 
((«,. ft) = 0). 

It is easy to see that the matrix of a bilinear form ( x , y) is diagonal 
in a pair of dual bases and right (left) trapezoidal in a pair of pseudo¬ 
dual bases. The questions of the existence and construction of dual 
and pseudodual bases are closely related to equivalence transforma¬ 
tions (91.4) of the matrix of a bilinear form (x, y) as well as to the 
factorization of that matrix. We shall turn to a detailed study of 
such bases only as need arises. For the present we shall restrict 
ourselves to their brief discussion. 

Theorem 100.2. In any nonsingular space each basis has a right 
and a left dual basis , which are unique. 

Proof. Consider a basis e,. e 2 , . . ., e n in a nonsingular space K n 
and let G e be the matrix of a bilinear form (x, y) in that basis. Accord¬ 
ing to (91.4) finding a left (right) dual basis to e„ e 2 , . . ., e„ is 
equivalent to determining a matrix P (Q) for which P'G e (G e Q) is 
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a unit matrix. Then P (Q) is the coordinate transformation matrix 
for a change from the basis e u e 2 , . . ., e n to the dual basis. Since 
the space is nonsingular, so is the matrix G e and there is a unique 
solution: P = G~ r (Q = Gj 1 ). 

Corollary. In any nonsingular space each basis has a left and a right 
pseudodual basis. 

Indeed, every left (right) dual basis is simultaneously a left (right) 
pseudodual basis. 

Taking into account the form of the matrix of a bilinear form (x, y) 
it is easy to establish that if in a nonsingular space we change from 
a left (right) dual basis to another basis, one with a left triangular 
coordinate transformation matrix with unit diagonal elements, then 
the new basis is left (right) pseudodual. 

Exercises 

1. Let a scalar product be symmetric. Is the number of 
vectors with positive, negative and zero values of (e h e,) invariant for nonor- 
thogonal bases e t , e 2 , . . ., e n ? 

2. How can any real or complex vector space be converted into a bilinear 
metric space with a symmetric scalar product with a given rank and signature? 

3. An orthogonal basis has no isotropic vectors in a nonsingular space. Can 
there be a basis consisting of isotropic vectors in such a space? 

4. Prove that as functions of vectors of a bilinear metric space orthogonal 
projection and perpendicular are linear operators. 

5. What form has the Gram matrix for a pseudoorthogonal basis if the right 
and left null spaces coincide? 

6. Prove that in any ordinary or Hermitian bilinear metric space there is 
a basis in which the Gram matrix is a right block-triangular matrix with lxl 
and 2X2 blocks along the diagonal. 

7. How can the coefficients of an expansion of a vector with respect to 
a basis for which some dual or pseudoduai basis is known be determined? 

8. Prove that in a nonsingular space the coordinate transformation matrix 
for a change from one basis, pseudodual to a given basis, to any other pseudo¬ 
dual basis of the same name is left triangular. 

101. Operators and bilinear forms 

If we have a linear operator in an ordinary or 
Hermitian bilinear metric space, then of course all the results ob¬ 
tained earlier for operators in a real or a complex space hold. We 
shall therefore study only additional properties of operators due to 
the presence in space of a scalar product. 

One of the major objects is the adjoint operator. In a Euclidean 
and a unitary space the adjoint operator was introduced using a 
scalar product, but in investigating its properties wide use was made 
of the existence in space of an orthonormal basis. Now we cannot 
take this way, since there may be no orthogonal basis in the general 
bilinear metric space. We shall make our studies in the Hermitian 
bilinear metric space. Changes for the ordinary bilinear metric space 
are very simple. 
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An operator A* (*A) in a Hermitian bilinear metric space K n 
is said to be a right (left) adjoint operator to an operator A in K n 
if for any vectors x, y 6 K n 

(Ax, y) = (x, A*y) ((x, Ay) = ( *Ax, y)). (101.1) 

Take a basis e v e 2 , . . e„ in K n and let G e be the Gram matrix 
for that basis. Denote by A e the matrix of A in the basis e x , e 2 , . . . 
. . ., e n and by AJand *A e the matrices of A* and *A if they exist. 

Theorem 101.1. For any linear operator A in a nonsingular Hermi¬ 
tian bilinear metric space there are unique adjoint operators A * and *A, 
with 

A: = G' l A' e G e , •A e = G-' l ’A'G' c . (101.2) 

Proof. If A* exists, then according to (101.1), in matrix notation 
of the type (61.2) and (91.7), we have 

(Ax, y) = (Ax), G e y e = x' e (A' e G e ) y e , 

(x, A*y) = x' t G e (A*y) e = x e (G e A*) y e . 

The right-hand sides of these relations must coincidejforall vectors 
x e and y e , so A' t G e = GgA ?, whence follows the first of the equations 
(101.2). Similarly, 

(x, Ay) = XeG e (Ay)e = x'e (G e A e ) y e , 

(*Ax, y) = (*Ax)e G e y e = x t (*A' e G e ) y e , 

so GgA e = *AI.G e and we obtain the second of the equations (101.2). 

Equations (101.2) imply that if adjoint operators exist, then they 
are unique. Take now those equations as a form of assigning the 
right and left adjoint operators. It is easy to verify directly that the 
operators thus constructed are linear and satisfy relations (101.1). 

Corollary. If a Hermitian bilinear form (x, y) is symmetric or 
skew-symmetric, then the right and left adjoint operators coincide. 

Indeed, in these cases G e = ±G' e for any basis e lt e 2 , . . ., e„. 
According to (101.2) we now conclude that A? = *A e . 

It follows from the corollary that the right and left adjoint opera¬ 
tors coincide in a unitary space. But this fact can be established in 
another way. If in a unitary space an orthonormal basis e ly e 2 , ... 
. . ., e n is taken, then G e = E and we obtain the well-known equa¬ 
tions A* = *A e = A'. 

The adjoint operators are connected with A by certain relations. 
Note some of them, for example, for the right adjoint operator: 

(A -f B)* =A* + B*, 


(a A)* = a A*, 
(AB)* = B*A*, 


(101.3) 


(A*)" 1 = (A- 1 )*. 
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For the left adjoint operator the relations are similar. All relations 
can be proved according to the same scheme using representations 

(101.2) for the matrices of adjoint operators. Therefore we shall 
prove only the validity of the last property. We have 

(A?)- 1 = G; 1 (£)-i (GJ 1 )-* = G; 1 W)' G e = (A; 1 )*. 

Comparing formulas (75.4) and (101.3) we can see the absence in 

(101.3) of the analogue of the first of the relations (75.4). It now looks 
like this: 

(*A)* = *(A*) = A. (101.4> 


To prove its validity we again turn to representations (101.2) and get 


(*A e )* = g; 1 (M e )' G t = G- e l (G;»m;g;)' G e 
*(A?) - Ge 1 ' (A*?G' e = G; 1 ' (G- e l A’G:y G' e = 


= Ge l G e A e Ge l G e = Ai e , 
Gl V G'eAeG'e 1 'G'e = A e , 


i.e. relations (101.4) do hold. 

Theorem 101.2. If in a nonsingular Ilermitian bilinear metric 
space an operator A has in some basis a matrix J, then in a right (left) 
dual basis the operator A* (*A ) has a matrix J*. 

Proof. Let A have a matrix J in a basis e x , e 2 , . . ., e n . Consider 
a right dual basis / lt / 2 , . . ., /„. Denote by G f , G, and G ef = E 
the matrices of a bilinear form (x, y) in the corresponding bases. 
If P is a coordinate transformation matrix for a change from the- 
first basis to the second, then we have 


G e = G',P-' = P-', G,= P'G ef =P' 


and then, taking into account (63.7) and (101.2), we get 


A; = GfA'fif = GJ 1 (P-VP)' G, = Gj'GfJ’Gj'G, = J*. 

If, however, A has a matrix 7 in a basis f lt / 2 , . . ., /„, then for 
that basis the basis e x , e 2 , . . ., e n is the left dual basis and we now 
find 

*A e = G- e l, A'G'e = Ge 1 ' (PA^)' G'e = G^'G'J'G^'G'e = /*. 

This theorem is as significant in the study of adjoint operators in 
Hermitian bilinear metric spaces as Theorem 75.2 is in unitary 
spaces. It follows from it in particular that the right and left adjoint 
operators A* and *A have the same eigenvalues, complex conjugate 
to those of A, that the right and left adjoint operators A* and *A 
have a simple structure, if A has a simple structure, and so on. 

- Besides a scalar product ( x , y) other Hermitian bilinear forms can 
be given in a Hermitian bilinear metric space. Consider, for example, 
functions of the form (Ax, y) and ( x , Ay), where A is an arbitrary 
linear operator. It is not hard to see that they are Hermitian bilinear 
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forms. Different operators give different forms in any nonsingular 
space K n . Indeed, if A and B are different operators, then Ax = 5 ^ Bx 
at least for one vector x. Suppose (Ax, y) = (Bx , y) for each y £ K„. 
It follows that ((A — B) x, y) = 0 for each y 6 K n , i.e. that 
(A — B) x 6 1 K n . But in a nonsingular space the subspace x K n 
consists only of a zero vector, so Ax = Bx. 

Theorem 101.3. In a nonsingular Hermitian bilinear metric space K n 
■any Hermitian bilinear form tp (x, y) can be uniquely represented as 

cp ( x, y) = (Ax, y) = (x. By), 

where A and B are some linear operators in K n . 

Proof. Choose in K n some basis e it e 2 , . . ., e„ and let G e be 
the Gram matrix in that basis and the matrix of the form cp (x, y). 
We have 

<p (x, y) = x'A\y e = x' e <t) e G?G,y,‘ 

= (G-' l '<X>' t x e )' G e y e = x' e G e G- t '<X> e y e = x' t G e (G; l ® e y e ). 

Now the matrices A e and B, of the desired operators A and B are 
defined by 

= &?'<!>;, B e = G- e l O e . (101.5) 

The uniqueness of A and B was proved earlier. 

An adjoint operator can be defined using a scalar product. There¬ 
fore, if different scalar products are introduced in a vector space, then 
the same linear operator will have different adjoint operators. Sup¬ 
pose in a vector space together with a scalar product given by a bi¬ 
linear form (x, y) we introduce scalar products given by forms 
{Mx. y) and (x. My). Label with a subscript M on the left (right) 
adjoint operators relating to the scalar product (Mx, y) ((x, My)). 
Theorem 101.4. For any operator A and a nonsingular operator M 

m A* = (MAM-y, * U A = M~ l (M) M, 

A* M = M~ l A*M, *A m = *(MAM~% (101 

Proof. Choose some basis e„ e 2 , . . ., e n and let G e and M e be the 
matrices of a bilinear form (x, y) and of the operator M in that 
basis respectively. According to (101.5) the matrix of a bilinear 
form (Mx, y) is equal to M' e G ». According to (101.2) we now find 

m A* = (MiG,)~ l £(M&e) = Gj 1 (Me l 'AlMl) G e 

= G; 1 (M e A e M- e l Y G e = (MAM-y t 
%A e = (M' e G e )- 1 ' K (M'G e )' = M- e l G- e l 'A' e G’ e M 
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These matrix equations prove the validity of the first group of the 
operator equations (101.6). The second group follows trivially from 
the first, if we take into account the equation ( x, My) = (*Mx, y) 
and relations (101.2). 

There are different types of operators in the Hermitian bilinear 
metric space. An operator A is said to be Hermitian or self-adjoint, 
if for any x, y 6 K n 

{Ax, y) = (x. Ay), 

and skew-Hermitian or skew-adjoint, if 

{Ax, y) = —{x, Ay). 

Hence respectively 

A = A* = *A, A = -A* = -*A. 

An operator A is said to be isometric if for any x, y £ K n 

{Ax, Ay) = {x, y). 

This leads to the equations 

*AA = A* A. 

In an ordinary bilinear metric space the analogues of the Hermi¬ 
tian and the skew-Hermitian operator are called a symmetric and 
a skew-symmetric operator respectively. In what follows we shall 
often deal with operators defined by the equation 

A* = a£+pA (101.7) 

for some numbers a and 0. 

By far not all properties of operators of a special form carry over 
from the unitary to the Hermitian bilinear metric space, although 
they do have something in common. We shall not discuss all these 
questions. 


Exercises 

1. How are the characteristic polynomials of operators 
A, A * and *A related? 

2. Let a subspace L be invariant under an operator A . Prove that the sub¬ 
space L x ( X L) is invariant under A* (*/!). 

3. Prove that any eigenvector of an operator A corresponding to an eigen¬ 
value X is left (right) orthogonal to any eigenvector of A * (*A ) corresponding to 
an eigenvalue p =£ X. 

4. Prove that any root vector of an operator A corresponding to an eigenval¬ 
ue X is left (right) orthogonal to any root vector of A* (M) corresponding to 
an eigenvalue p X. 

5. Prove that the eigenvalues of a Hermitian (skew-Hermitian) operator 
corresponding to nonisotropic eigenvectors are real (pure imaginary). 

6. Prove that the moduli of the eigenvalues of an isometric operator cor¬ 
responding to nonisotropic eigenvectors equal unity. 
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7. Suppose in a nonsingular space the scalar product is Hermitian-symmet- 
ric. Prove that if ^4 is a Hermitian (skew-Hermitian) operator, then the bilinear 
form (Ax, y) is Hermitian-symmetric (skew-symmetric). 

8. Suppose in a nonsingular space the scalar product is Hermitian-symmet¬ 
ric. Prove that if the bilinear form (Ax, y) is Hermitian-symmetric (skew-sym¬ 
metric), then the operator A is Hermitian (skew-Hermitian). 

9. How do the statements of Exercises 7 and 8 change if tne scalar product is 
skew-Hermitian? 

10. Prove that if an operator A satisfying condition (101.7) has at least two 
distinct eigenvalues, then | (5 | = 1. 

102. Bilinear metric isomorphism 

It was shown in the study of Euclidean and 
unitary spaces that to within isomorphism there is only one space of 
each dimension n. For bilinear metric spaces the situation is more 
complicated. 

We introduce the concept of isomorphism. We shall say that ordi¬ 
nary or Hermitian bilinear metric spaces over the same number field 
are isomorphic if they are isomorphic as vector spaces and the scalar 
products of pairs of corresponding vectors are equal to each other. 

It follows from this definition that in isomorphic spaces the Gram 
matrices of the systems of corresponding vectors coincide. The con¬ 
verse is also true. If in bilinear metric spaces over the same number 
field there are such bases in which Gram matrices coincide, then 
those spaces are isomorphic. Indeed, by establishing a correspondence 
between bases with equal Gram matrices we ensure that scalar 
products coincide for any pair of vectors in the bases and hence for 
any pair of vectors. 

Theorem 102.1. Ordinary (Hermitian) bilinear metric spaces over 
the same number field are isomorphic if and only if the Gram matrices 
of arbitrary bases of those spaces are congruent ( Hermitian-congruent ). 

Proof. Necessity. The Gram matrices of all bases of the same 
space are congruent and they coincide on the corresponding bases of 
different spaces. By virtue of the transitivity of the congruence rela¬ 
tion, the Gram matrices of arbitrary bases of isomorphic spaces are 
congruent. 

Sufficiency. If the Gram matrices of arbitrary bases of bilinear 
metric spaces are congruent, then there are bases in different spaces 
on which the Gram matrices coincide. But then the spaces are iso¬ 
morphic. 

The theorem says that the problem of classifying bilinear metric 
spaces is equivalent to that of classifying bilinear forms to within 
congruence. Consider some classes of bilinear metric spaces. 

A real bilinear metric space K n is said to be pseudo-Euclidean if the 
scalar product is given by a nonsingular symmetric bilinear form. 

For an arbitrary basis of a pseudo-Euclidean space the Gram ma¬ 
trix is real symmetric and, as we know, congruent with a diagonal 



102] 


Bilinear metric isomorphism 


355 


matrix with elements ±1. This means that in every pseudo-Euclid- 
ean space there is a basis in which a scalar product ( x , y) of vectors 
x and y with coordinates £i» .... | n and T| lt . . r| n is given by 
the formula 

y) = 5i^li “f" • • • “t" £s*|i ~~ £»+i^l«+l ““ • • • ““ £n*ln* 

To within isomorphism pseudo-Euclidean spaces are defined by their 
two characteristics: dimension and signature, a positive and a nega¬ 
tive index and so on. Of particular interest to physics among the 
pseudo-Euclidean spaces is four-dimensional space with a positive 
index equal to unity. This is the so-called Minkowski space-time or 
Minkowski universe. It is the space-time of special relativity. 

A real bilinear metric space K n is said to be simplectic if the scalar 
product is given by a nonsingular skew-symmetric bilinear form. 

The Gram matrix for any simplectic space is skew-symmetric and 
therefore congruent with a block-diagonal matrix with blocks of 

the form • Consequently the dimension of a simplectic 

space is always even and, to within isomorphism, there is only one 
simplectic space of a given even dimension. There is a basis in such 
a space in which the scalar product of vectors x and y with coordi¬ 
nates g lt . . ., l„ and r]!,..., T|„ has the form 

!/) = 5iT] 2 — Sa 1 !! -("••• "I" ^n-l'ln — ^n'ln-l - 

A complex bilinear metric space K„ is said to be complex Euclidean 
if the scalar product is given by a nonsingular symmetric bilinear 
form. 

For any basis its Gram matrix is complex symmetric and congru¬ 
ent with a unit matrix. To within isomorphism there is only one 
complex Euclidean space of each dimension. In any complex Euclid¬ 
ean space there is a basis in which the scalar product of vectors x 
and y is as follows: 


( x t y) wl^ll “I" ^2^12 “f” • • • h" ^rjT]n* 

A complex Hermitian bilinear metric space is said to be pseudouni¬ 
tary if the scalar product is given by a nonsingular Hermitian-sym- 
melric bilinear form. 

The Gram matrix for any pseudounitary space is Hermitian. It is 
Hermitian-congruent with a real diagonal matrix with elements ±1. 
There is always a basis therefore in which the scalar product of 
vectors x and y has the form 

(*. y) = ?iTh+ ■ • • + £«"«—e,+|Ti <+ i— ... — £ n n„. 
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where .... £ n and rjn . . T) n are the coordinates of x and y. 
Again to within isomorphism a pseudounitary space is uniquely 
defined by its two characteristics: dimension and signature, a posi¬ 
tive and a negative index and so on. 


Exercises 

1. Prove that in isomorphic spaces to orthogonal 
(pseudoorthogonal, dual, pseudodual) bases there correspond orthogonal (pseu- 
doorthogon&l, dual, pseudodual) bases. 

2. Prove that in isomorphic spaces to nonsingular subspaces there corre¬ 
spond nonsingular subspaces. 

3. Prove that in isomorphic spaces perpendicular and projection go over 
into perpendicular and projection respectively. 

4. Prove that in isomorphic spaces the Gramians of the corresponding systems 
of vectors are equal. 



CHAPTER 13 


Bilinear Forms 
in Computational Processes 


103. Orthogonalization processes 

One of the major concepts associated with 
any bilinear metric space is that of orthogonality. We have often 
seen what important role orthogonal systems of vectors and espe¬ 
cially orthogonal bases play in the study of Euclidean and unitary 
spaces. Of no less significance is the role played by bases with ortho¬ 
gonal vectors in other spaces. Up to now, however, most of our rea¬ 
soning has been connected with proving the existence of such systems 
and not with the processes of constructing them. The only exception 
in a sense is the general method of transforming the matrices of 
bilinear forms to canonical form and the related construction of 
canonical bases. In view of the importance of orthogonal, pseudoor- 
thogonal and other similar systems for designing diverse computa¬ 
tional algorithms we consider now a general process of constructing 
such systems in bilinear metric spaces. 

Let (x, y) be a scalar product given in a complex vector space K n 
using some nonsingular Hermitian bilinear form. We consider a basis 
e lt e t , . . ., e n and try to construct another basis, / lt 
possessing the following two properties: 

(1) for any k ^ 1 the spans L* of vectors e lt . . ., e h and / lt . . . 
. . ., f h coincide, 

(2) the basis f lt . . ., /„ is pseudoorthogonal. 

Let (e 1 , Cj) 0 and put/j = e v Suppose a system of pseudoorthogonal 
vectors/!, . . ., f k has already been constructed, with the span of 
those vectors and that ofe„ . . ., e k coinciding and (/,, f t ) 0 for 
1 ^ i ^ k. We shall seek a vector f h+1 in the form 

A 

/ktl = e A+l + 2®i,A+l/l> (103.1) 

i=l 

whereof h+1 , .. ., a ftih+1 are unknown coefficients. The conditions 
of the left orthogonality of /* +1 to / lt . . ., f h give the following 

system of linear algebraic equations to determine. a lf k+1 , . . . 

• • *i a ft, h+l : 

a l, A + l (/li /l) = —( e A + l» fl)t 

a i. ft+i (/n ft) + a,. h +1 (/ 2 , / 2 ) — —( e ft+n / 2)1 


a l, ft + l (/it fh) + <*2. ft + l (/*t fh) + 

••• + «*. fc+l (/ft* fh) ~ —( e h + l, fh)- 


(103.2) 




358 


Bilinear Forms in Computational Processes 


[Ch. 13 


The matrix of the system is left triangular. Under the assumption its 
diagonal elements are nonzero, so system (103.2) has a unique solu¬ 
tion. It is clear that the vector f k+l thus constructed and the vectors 
A» ...» /it form together a pseudoorthogonal system and that their 
span coincides with that of e lt . . e fe+1 . The system of vectors 
/j, . . ., f h+1 is linearly independent, for so is the system f lt . . . 

• • •» lh i &h + i' 

We continue the process further. If it turns out that for every i 
(ft, ft) are nonzero, then the resulting system of vectors / lt ...,/„ 
will be the desired pseudoorthogonal basis. Of course, we can now 
norm the vectors f l% . . ., /„ and obtain a pseudoorthonormal basis. 

A useful consequence follows from (103.1). We rewrite it as 

h 

e h+l = ( — 2 a l.h+lfl) + fh+l- 
<=1 


The vector in the parentheses lies in L h and the vector f h+1 is in 
by construction, so the solution of system (103.2) gives in fact a de¬ 
composition of each vector e h+ i into the projection and left perpen¬ 
dicular relative to L h . 

The process is greatly simplified if the scalar product is given by 
a Hermitian symmetric bilinear form. In this case the conditions 
(ft, U) — 0 for ; < i imply that the conditions (/,, fj) — 0 hold for 
/ =?£= i. System (103.2) therefore becomes a system with a diagonal 
matrix and we have 




(gfc+i> ft) 

(/i. U) 


for every t. The constructed basis f Y , / 2 , . . ., /„ will be not only 
pseudoorthogonal but it will also be orthogonal. 

The only thing that may prevent the construction of a pseudoor¬ 
thogonal basis fi, . . ., /„ from e lt . . ., e n is that one of the scalar 
products (/ ( , /(), i <C n, may vanish. Such a situation is called degen¬ 
erate. No degenerate situation will obviously set in if the quadratic 
form (. x , x) has no isotropic vectors, if, for example, it is strictly of 
constant signs. Indeed, equation (/,, f t ) = 0 is then possible only 
for /, = 0. But f t #0 for every i, since vectors fi, •••, ft are linear¬ 
ly independent. Hence the process can now be realized under any 
choice of basis e lt . . ., e n . 

In many problems it is not necessary to preserve the relation of 
the new basis f v to the original basis e lt . . ., e n , since it is 

required to construct only some pseudoorthogonal basis in space. 
It is then necessary whenever (f t , f t ) = 0 occurs to replace e t by 
another vector and compute again a vector / Jt repeating the procedure 
until the condition (/,, /,) # 0 holds. The vectors / lt . . ., /j.j 
remain unchanged. 
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The vector required for the replacement of e ( will necessarily be 
found. Suppose that (/,, /,) = 0 holds for any vector e t . By virtue 
of the left orthogonality of the vector f t to the vectors e lt . . ., e,^ 
this means that the subspace - L L / _ 1 consists only of isotropic vectors 
and a zero vector. But the subspace Lj.j is nonsingular, so 1 £/_ 1 = 
= l K n . The last equation is impossible for i — 1 < n, since due 
to the nonsingularity of K n the subspace L K n consists only of a zero 
vector. 

Similarly it is possible to construct a basis pseudodual to a given 
one. Suppose again that e lt e 2 , . . e n is a given basis and that 
it is necessary to construct a basis pseudodual to it, for example a left 
basis. Take another basis, q 1} q 2 , . . ., q n . Suppose (q lt e j) # 0 
and put = q x . Assume that a system of vectors t v . . ., has 
already been constructed such that their span coincides with that 
of vectors q lt . . ., q h , and that (t t , e,) =£ 0 holds for 1 i ^ k 
and ( t 1 , ej) = 0 does for ; •< t. We seek a vector * fc+1 in the form 

h 

*ft+i = 9fc+i+ S (103.3) 

•=i 

where 0 lf fc+1 , . . p A- fc+1 are unknown coefficients. The condi¬ 
tions of the left orthogonality of f A+1 to the vectors e x , . . ., e h 
again yield for the determination of pj. h+1 , . . p A , A+1 a system 
of linear algebraic equations with a left triangular matrix: 

Pi. h+l (*n e i) — —(?h+l> e l)> 

Pi, h + i (^l» e i) + P*, fc + 1 (^21 ^ 2 ) = (?h + l> e i)t 

. (103.4) 

Pi, h + l (*l. e h) + P2. h + l (*2i e h) + • • • + Ph, fc + i (ffc. ffc) 

= ~(9h + l. e h)- 

According to the assumption about diagonal elements the system 
has a unique solution. If it is found in continuing the process that 
the quantities (t t , e t ) are nonzero for all i, then after an appropriate 
normalization the resulting system of vectors is a left pseudodual 
basis to e lt e 2 , . . ., e n . Notice that now the process does not become 
simpler if the scalar product is given by a symmetric bilinear form. 
Employing an auxiliary basis q 2 , q 2 , . . ., q n makes it possible to 
avoid the degeneration of the process by replacing at the proper time 
one of the vectors q t and repeating the computation of the vector t t . 
Again the vectors t lt . . ., remain unchanged. 

In what follows, regardless of their particular content all the above 
and similar processes will most often be called orthogonalization 
processes. We shall sometimes have, however, to construct in the same 
bilinear metric space K n sequences of vectors orthogonal or pseudo- 
orthogonal relative to different bilinear forms. We shall discuss only 
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bilinear forms of the form (Rx, y), where R is some linear operator 
in K n . To distinguish between sequences, we shall speak in this case 
of if-orthogonalization, if-pseudoorthogonalization, and so on. 

Many properties and features of orthogonalization processes can 
be established by considering their matrix notation. Let a scalar 
product in K n be given by a Hermitian bilinear form ( x , y ). The 
pseudoorthogonality of a basis f lt f t , ...,/„ implies that (//, fj) — 
= 0 for / < i, i.e. that the Gram matrix G f of the bilinear form (x, y) 
in the basis /j, / 2 , . . ., /„ is right triangular. According to the 
process of constructing a new basis the spans of vectors /,, . . ., f h 
and e„ . . ., e h coincide. Hence in view of (103.1) we conclude that 

e i ~ hi 

e i = —«i, 2/1 + /21 

. (103.5) 

€n n/l ^2, n /2 • • * 

where a tf are precisely the coefficients that are computed from system 
(103.2). Therefore the coordinate transformation matrix A for a 
change from the new basis f lt f t , . . ., /„ to e lt e t , .... e n is a right 
triangular matrix with unit diagonal elements. Since the coordinate 
transformation matrix for a change from the old basis to a new basis 
coincides with A~ l , we have 

G f =*A-''Gjr'. 

Hence 

G e = A'GfA. (103.6) 

It is easy to verify that G f A is a right triangular matrix whose 
diagonal elements coincide with those of G f . 

Denote by E q (F q ) a matrix whose columns are the coordinates of 
vectors e lt . . ., e n {j r , . . ., /„) in a basis g,, . . ., q„. Relations 
(103.5) show that 

Eg = FgA, (103.7) 

and that of course 

G f = FgGgFg. (103.8) 

Thus the above process of constructing a pseudoorthogonal basis 
proves to be closely related to the factorization of a Gram matrix 
into triangular factors and to factorization (103.7) into the factors 
of the matrix of coordinates. 

Theorem 103.1. For process (103.1) and (103.2) of constructing 
a pseudoorthogonal basis / lt / 2 , . . ., f n from a basis e lt e t , . . e„ 
in a nonsingular bilinear metric space K„ to be implementable it is 
necessary and sufficient that the Gram matrix of the system e lt e 2 , • • • 
. . ., e n should have nonzero principal minors. 
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Proof. Necessity. Let the process be implementable, i.e. let rela¬ 
tion (103.6) hold. The matrix G f is nonsingular, since it is the Gram 
matrix of a nonsingular bilinear form ( x , y) for a basis. Therefore 
all its diagonal elements are nonzero. Applying the Binet-Cauchy 
formula we get 



for every r. 

Sufficiency. Let the principal minors of the Gram matrix G e 
be nonzero. Hence according to (93.1) there is a decomposition 
G e = LJDgU e , where L e is a left triangular matrix with unit diago¬ 
nal elements, D e is a diagonal matrix with nonzero elements and U e 
is a right triangular matrix with unit diagonal elements. It is easy 
to see that the matrix 

G, = U?'GjU- t ' = V?'L f D e 

is a left triangular matrix whose diagonal elements coincide with 
those of D t . If we now take as a matrix A a right triangular matrix 
U e with unit diagonal elements, then relations (103.5) will hold for 
the basis / lt / 2 , . . /„. It is this basis that will be constructed 

according to process (103.1) and (103.2), which can easily be seen 
from a direct check. 

If the scalar product is given by a symmetric Hermitian bilinear 
form, then the matrix G e will be Hermitian, as well as the matrix Gf. 
But it follows that G f is diagonal. This fact has already been notea. 
Comparing (93.5) and (103.6) we conclude that in this case the orthog¬ 
onalization process virtually coincides completely with the process 
of obtaining decomposition (93.5). 

If K n is a unitary space, then the orthogonalization process deter¬ 
mines not only the factorization of the Gram matrix into triangular 
factors but also the decomposition of the matrix of coordinates as 
a product of the unitary and the right triangular factor. Indeed, 
choose an orthonormal basis q Jt q 2 , . . ., q„ and denote by D T 
a diagonal matrix made up of the lengths of the columns of the 
matrix F q of (103.7). We now have E q = (FgDq 1 ) (D^A). The 
matrix D q A is right triangular. But G q and G f (D, 1 )* are unit matri¬ 
ces. In this case according to (103.8) (F q D q 1 )' (FqDg 1 ) = E, i.e, 
FqD q l is a unitary matrix. 

The fact that the basis f lt t 2 , • • •> *n is left pseudodual to the 
basis e lt e t , . . ., e n implies that for the bilinear form (x , y) deter¬ 
mining the scalar product in K n conditions (< ( , ej) = 0 for j < i 
and (t t , e t ) = 1 for every i hold. In other words, this implies that 
for a pair of bases e lt e % , . . ., e n and (j, t % , . . ., t n the matrix 
G ie of the bilinear form ( x , y) is a right triangular matrix with unit 
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diagonal elements. From this we conclude that the coordinate trans¬ 
formation matrix Q~ l for a change from the original basis q lt g 2 , . . . 
. . g„ to tj, t 2 , . . ., t„ is right triangular. Now the diagonal ele¬ 
ments will not equal unity, however, since the vectors f 2 , < 2 , .... <„ 
have undergone normalization. We have 


and further 


G le = Q-i'G qe 

Gqe == Q'Gtt‘ 


The process of constructing a basis pseudodual to a given basis 
also proves to be closely related to factorization (93.1) of a matrix 
into triangular factors. 

Theorem 103.2. For process (103.3) and (103.4) of constructing 
a basis tj, t 2 , . . ., t n , left pseudodual to e lt e 2 , . . ., e n , starting 
from g lt g 2 , . . ., q n to be implementable it is necessary and sufficient 
that the matrix G qe of a bilinear form ( x , y) should have nonzero princi¬ 
pal minors in the bases g,, g 2 , . . ., g„ and e t , e 2 , . . ., e n . 

The proof of this theorem is omitted since it is almost a word-for- 
word repetition of the proof of the preceding theorem. 

To conclude, the orthogonalization processes considered carry over 
completely to ordinary bilinear metric spaces. There are changes 
in some details associated with complex conjugation. Besides, it is 
more complicated to eliminate degenerate situations. 


Exercises 

1. What is the geometrical interpretation of the 
orthogonalization process? 

2. Prove that if the orthogonalization process is applied to a linearly depen¬ 
dent system e lt e«, . . ., e n , then / ft = 0 for some k^ n. 

3. Let a quadratic form (*, x) have no isotropic vectors. How is a basis 
of the given system of vectors to be constructed using the orthogonalization 
process? 

4. Prove that if the orthogonalization process is carried out in a Euclidean 
or a unitary space, then I /j, I ^ I e h | for every k, equality holding if and only 
if a vector e h is orthogonal to vectors e t , . . ., e A _ l . 

5. Let the coordinates of vectors e 1 , form a triangular matrix in some 

orthonormal basis of a Euclidean or a unitary space. How is the matrix of coor¬ 
dinates affected by the orthogonalization process? 

6. Is it possible to construct a dual basis using the orthogonalization pro¬ 
cess? 

7. How is the orthogonalization process to be applied to obtain a right 
pseudodual basis? 

8. Prove that the decomposition of a complex nonsingular matrix into 
a product of a unitary matrix and a right triangular matrix is unique if it is 
required that the diagonal elements of the triangular matrix should be positive. 

9. How is the orthogonalization process to be applied in order to solve 
a system of linear algebraic equations? 
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10. Let a space K n be singular. How is a pseudoorthogonal basis of a non¬ 
singular subspace of maximum dimension to be constructed using the orthogo¬ 
nal ization process? 

11. Does the construction of pseudoorthogonal systems of vectors become 
simpler in a singular space if the quadratic form ( x , x) is of constant sings? 


104. Orthogonalization 
of a power sequence 

In orthogonalization processes the coordinate 
transformation matrix for a change from the old basis to a new basis 
is always triangular. However, if the original basis is chosen in 
a special way, it is possible to obtain significantly simpler representa¬ 
tions for the coordinate transformation matrix and hence simpler 
orthogonalization processes. 

Let A be some operator in a nonsingular Hermitian bilinear metric 
space K n . Take a nonzero vector x and consider a sequence of vectors 

x, Ax, A*x, . . ., A h ~ l x. (104.1) 

We shall call such sequences power sequences generated by the vec¬ 
tor x. 

In any power sequence some number of the first vectors is linearly 
independent. Suppose k is the largest of such numbers. This means 
that there are numbers a 0 , a lt . . a*, with a h # 0, such that 

a-oi + a t Ax + a k A h x = 0. (104.2) 

Denote by <p (X) = a k \ h + ...-(- a,X + a 0 a polynomial of de¬ 
gree k. Clearly (104.2) is equivalent to 

«p(i4)* = 0. (104.3) 

There are many polynomials for which relations of the type 
(104.3) hold. In particular, such a polynomial is the characteristic 
polynomial of A. But there is clearly a polynomial of the lowest 
degree among them. It is called the minimum polynomial annihilat¬ 
ing the vector x. It is clear that its degree equals the maximum 
number of the first vectors of the power sequence (104.1) that form 
the linearly independent system or equivalently is a unity less than 
the minimum number of the first vectors that form the linearly 
dependent system. 

The degree of the minimum polynomial turns out to be closely 
related to the expansion of the vector x with respect to the root 
basis of the operator A by the heights of the root vectors and the 
number of mutually distinct eigenvalues. That is, we have 

Lemma 104.1. The degree of a minimum polynomial annihilating 
a vector x equals the sum of the maximum heights of the root vectors of 




364 


Bilinear Forms in Computational Processes 


[Ch. 13 


an operator A present in the expansion of x with respect to a root basis 
and corresponding to mutually distinct eigenvalues. 

Proof. We represent a vector i as a sum 

X = Uj + u t + . . . + u„ (104.4) 

where u 1( . . u, lie in different cyclic subspaces of A. Since differ¬ 
ent cyclic subspaces have no common vectors except the zero vector, 
for equation (104.3) to hold it is necessary and sufficient that the 
equations <p (A) u t =0 should hold for every i. If Uj is a root vector 
of height m ( and corresponds to an eigenvalue X ( , then the equation 
(p (A) ui = 0 will hold if and only if the polynomial q> (X) is divisible 
by (Jl — ft ( ) r , where r m ( . In this case q> (/l) uj = 0 not only for 
j = i but also for all those ; for which the vectors uj correspond to 
the eigenvalues coinciding with and have heights not greater 
than r. Let Xj,, . . X ip be mutually distinct eigenvalues corre¬ 
sponding to the vectors u t , . . ., u 4 of (104.4) and let m<„ . . ., m ip 
be the maximum heights of the root vectors u lt . . ., u, correspond¬ 
ing to the eigenvalues coinciding with X| , . . ., X tp . Then 

cp (X) = (X—X 4 ,) m '* ... (X-x ip ) m, p 

will be the minimum polynomial annihilating the vecor X. Thus the 
lemma is proved. 

Suppose the vectors e, — A ,_1 x for 1 ^ i ^ ft are linearly inde¬ 
pendent. Apply to this system the process described earlier for obtain¬ 
ing a pseudoorthogonal system of vectors f t , assuming of course that 
the process is feasible. If the operator A is in no way related to the 
scalar product introduced in K n , then there is little hope for any 
simplification of the process. The situation sharply changes, however, 
if A satisfies relation (101.7), if for example it is a self-adjoint opera¬ 
tor in a unitary space. 

Theorem 104.1. If an operator A satisfies relation (101.7), vectors 
e ( = A t_1 x are linearly independent for 1 ^ i ^ ft and vectors f l% ... 
. . ., f k are obtained from e lt .... e h using the pseudoorthogonaliza - 
tion process, then the following relations hold 


where 


fi 

/» = Afi — aj/i, (104.5) 

fi+i = A/| — a ( /j — i > 1, 


M/i. h) Wi, ft- 1) 

1 (/i./i) ’ P, -‘ Ui-i.fi-i)’ > f 

Ui- 1. h)-(Afi,ft- 1) (0-i, U) 

! it. x \ n. x.v • * 


ft Wl, U- 1) 

P '-‘ If l—1» fl-i) ’ 


(104.6) 


(/l-l./i-l) ilhll) 



104] 


Orthogonalization of a power sequence 


365 


For definiteness the proof is made in a Hermitian bilinear metric 
space. Taking into account the form of vectors e t , we conclude from 
(103.5) that 

f i = A i ~'x+ S yj, l A i x 

*-o 

for some numbers It follows that the vector A+i — Af t is in 
the span of vectors x, Ax, . . ., A ,_1 x or equivalently that of vectors 
At ft, • • -i fi- Therefore 

< 

A+i = Af { + i+tfi 

for some numbers £y a J+1 . The conditions of the left orthogonality 
of A+i to /i, f t , . . ., A yield the following system of linear al¬ 
gebraic equations to determine the coefficients , +I : 

£i. i+i (/it A) = — (Aft, A). 

Si. i+i (/n A) “I - I 2 . 1+1 (/21 ft) = (Aft, A). 

. (104.7) 

Si. 1+1 (/it A-i) + • • • + S 1 - 1 . 1+1 (A~it A-i) = —(Af lt A-i). 

Si, 1+1 (/it A) + • • ■ + S 1 - 1 . 1+1 (A-it A) + Si, 1+1 (/it A) 

= -Wi, A). 

Under the hypothesis of the theorem the operator A satisfies 
condition (101.7). Therefore in view of the pseudoorthogonality of 
the system of vectors fj we have for / ■< i — 1 

(Aft, A) = (A, A*u) = (A, (aS+M) A) - a (A. A) 4 -P (A. ^A) 

= P(A. A+t-S S/. «+iA) = P {(A. /.+.)— JS I/. 1+1 (A. A)} = 0 . 

;=i ;=i 

Among the right-hand sides of system (104.7) only the last two are 
nonzero. Hence only _ la (+ i and t+1 may be different from zero, 
which proves the validity of relations (104.5). The value for the 
coefficient is found from the condition of the left orthogonality of 
/, to A an< i the values for the coefficients a, and p ( _j are found from 
the condition of the left orthogonality of the vector A+i to the vec¬ 
tors A and A-i- 

Thus the orthogonalization process for a power sequence does turn 
out to be much simpler than that for a general sequence. If at some 
step it is found that (A, ft) = 0 but A =^= 0, then the degeneration 
of the process can be avoided by choosing a new vector x. 

Suppose that there are n linearly independent vectors in the power 
sequence. Applying the orthogonalization process it is possible to 
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construct a basis / lt / 2 , . . /„ of a space K n . This basis is remark¬ 

able for the matrix A f of the operator A having in it a tridiagonal 
form 


fa 1 

Pi 

■> 

1 

«2 P 2 

0 


1 «3 P 3 

(104.8) 


0 1 

®n-l Pn-1 



1 a„ . 


Indeed, the columns of the matrix of the operator are the coordi¬ 
nates of the vectors Afi, Af 2 , • • ■,Af n relative to the basis / lf / 2 , . . » 
But by (104.5) 

Ah = aJi + / 2 , 

Afi = Pi/i + <* 2/2 -f- / 2 , 

. (104.9> 

Af n -1 Pn- 2 / 71-2 “t“ ®n-l/n-l "I" /n» 

Af n — Pn-l/n-1 "I" &nfn- 

If there are not n linearly independent vectors in the power se¬ 
quence, then applying the orthogonalization process we get f r+1 = 0 
for some r < n. We take a new vector u and form a vector 

r 

V = U— 2 T)jfj 
;= 1 

determining the coefficients T], from the left orthogonality of the 
vector v to vectors f lt f 2 , . . ., f r . We construct a power sequence 
generated by v. It is easy to show that each vector of that sequence 
is also left orthogonal to f lt f 2 , . . f r . This property is by con¬ 
struction characteristic of the vector v. Suppose that we have proved 
it for all vectors v, Av, . . A h v. Then, considering relations (104.9) 
and equation (101.7), we get for the operator A 

(A h * l v, ft) = (A h v, A*f t ) = (A'v, (a E + M) ft) 

— a ('4 h t , i ft) + P Pi-i/i-t + a ifi + ft+i) — 0* 

Applying the orthogonalization process to the new sequence we 
construct a system of linearly independent vectors q lt q 2 , . . ., g, 
pseudoorthogonal to one another and, as a collection, to / lt / 2 , ... 

. . ., f r . If r + s < n, we continue the process of supplementing the 
basis until we have constructed a basis of the entire space, which 
will decompose into a direct sum of invariant subspaces. In such 
a basis the matrix of the operator A will have a block-diagonal form 
with tridiagonal blocks of the type (104.8). 
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Exercises 


1. Prove that the minimum polynomial annihilating 
a vector x is a divisor of the characteristic polynomial. 

2. Prove that the minimum polynomial annihilating a vector x is unique 
up to a scalar multiplier. 

3. Prove that if A is a Hermitian operator and K n is a unitary apace then 
formulas (104.6) become as follows: 


a i — 


(Aft, ft) 
(ft. ft) ’ 


(Aft, fj. l) _ (ft, >1/,-!) 

(/(-It fl- 1) (ft- 1> ft- 1) 


(ft, /,) 

(/ 1 — 1 » fl- 1 ) 


> 0 . 


4. Prove that under the conditions of Exercise 3 there is a diagonal matrix D 
such that for the matrix A/ of (104.8) the matrix D~ l A f D is real symmetric 
tridiagonal. 

5. Prove that if condition (101.7) holds then the matrices of bilinear forms 
(Ax, y) and (x, Ay) in the basis A, . . ., /„ are right almost triangular. 

6. Prove that if conditions of Exercise 3 hold then the matrices of bilinear 
forms (Ax, y) and (x. Ay) in the basis /,, ...,/„ are Hermitian tridiagonal. 

7. Prove that if K n is a singular space then using processes (104.5) and (104.6) 
it is possible to construct a pseudoorthogonal basis of a nonsingular subspace 
of maximum dimension. 


105. Methods of conjugate directions 

Constructing orthogonal, pseudoorthogonal, 
etc. systems of vectors, especially by making use of power sequences, 
provides great possibilities for developing various numerical methods 
of solving equations of the form 

Ax = b, (105.1) 

where A is an operator in a vector space K n , b is a given vector and x 
is the desired vector. 

We have repeatedly touched on various aspects of this problem. 
We now describe a large group of numerical methods of solving 
equation (105.1) known under the general name of methods of conju¬ 
gate directions. They are all based on orthogonalization processes 
of power sequences. Suppose for simplicity of presentation that the 
operator A is nonsingular and that hence equation (105.1) always 
has a unique solution. We assume that the space K n is complex and 
that the scalar product in it is given using a symmetric positive 
definite Hermitian bilinear form, i.e. that K„ is a unitary space. 

We take any nonsingular operators C and B and let s x , 
be some C.4£-pseudoorthogonal system of vectors, i.e. let 

(CABsi, # 0, ( CABs,, s k ) = 0, k < i, 
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for every i. Denote by x 0 the initial vector and let 


71 


X — Xq-\- B QjSjf 


i 

x i = x o + B p 

(105.2) 

r t = Ax j — b. 


Then from the relations 


it = x l . 1 -(- a l Bs l 

it follows that 

(105.3) 

ft — rj_| + a t ABs t . 

(105.4) 


It is easy to show that for the chosen CAZ?-pseudoorthogonal sys¬ 
tem 

(Cr it s ft ) = 0, 1 < k < i. (105.5) 

Indeed 

n 

r t = Ax ,— b — A(x l — x) =— 2 ajABs, 

i-i+i 

and further 

71 

(CrSh) = 2 (CABsj, Sk) = 0 

i=rf+i 

for every k ^.i. 

We assume that the system of vectors s lt . . is constructed 
parallel to the system r 0 , . . r„_j using the process of its CAB- 

pseudoorthogonalization. Set Sj = r 0 and for every i we have 

i 

$1+1 = + 2 Pft. i+l s h» (105.6) 

ft-i 


As always the conditions of the left CAU-pseudoorthogonality of 
a vector s J+ i to vectors s lt . . s t yield a left triangular system to 
determine the coefficients (3*, (+1 . In this case r* is a linear combina¬ 
tion of vectors s lt . . s k + 1 . Hence the scalar product ( Cr h r k ) is a 
linear combination of numbers (Cr t , Sj), . . ( Cr t , Sj+i) and it is 
zero by (105.5) for k < t, i.e. 

(Cr t , r h ) =0, k <C i. (105.7) 

This means that if (Cr h r ( ) # 0 for every i then the sequence of 
vectors r ( is C-pseudoorthogonal. In an n-dimensional vector space 
a C-pseudoorthogonal system cannot contain more than n nonzero 
vectors. At some step of the computational process therefore one of 
the discrepancies vanishes and we obtain an exact solution of equa¬ 
tion (105.1). 
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To implement the process, it is necessary for us to determine the 
coefficients a t in (105.2) and the coefficients p fc> J+1 r in (105.6). The 
coefficients a t can always be found in a very simple'way. According 
to (105.4), (105.5) and (105.7) we have 


r/_i) _ _ jCrj :: u_s 1 ]_ 

(CABsi, r;_|) (CABs t , s t ) 


(105.8) 


In general computation of the coefficients (V (+1 is much more 
complicated. But if the operators A, B and C are connected by the 
relation 

(CABC- 1 )* = aE + pAfl (105.9) 


for some numbers a and (3, then among all the coefficients p h>J+ | 
only p lt f+1 can be nonzero. Let 


f i+i — H -r- btS t . 


(105.10) 


The coefficient b t is uniquely determined from the condition of the 
left Ci45-orthogonality of the vector s (+1 to s t , which, considering 
(105.9) and (105.10), yields 


(CABr t , a,) _ _„ ( Cr t , ABsrf 

(CAB$i, si) ° ( CABsi , st) * 


(105.11) 


Suppose that computing a sequence of vectors s t by formulas 
(105.10) and (105.11) we have shown that the sequence s 1( . . ., s t 
forms a CAif-pseudoorthogonal system. This is obviously true for 
t = 2. Taking into account (105.9) we get for k < f from (105.4) 
to (105.7) 


( CABs t+ i , s h ) = ( CABr h s k ) -f b, ( CABs t , s h ) = (( CABC~ l ) Cr,, s h ) 

— (Crj, (CABC ')* Sj) = (Cr j, (ccE pAS) fy) — a (Cr,, s fc )-i-p (Cr ( , Affs^) 

= P (Cr„ -^(r h —r fc _,)) = -l-{(Cr,-, r„) —(Cr„ r fc _,)} = 0. 

Thus, (105.9) holding, the solution of the operator equation (105.1) 
an be effected according to the following prescription: 


s i — r 0t 

r, — r,_j + a,ABs,, 

s i+i = r t + b t St, 

Xt — “I - a,5sj. 


(105.12) 


Here x 0 is an arbitrary initial vector, the coefficients aj and b ( are 
computed according to (105.8) and (105.11). Letting u t = Bs t , 
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the process will be as follows: 

u, = Br 0 , 

H = -(- a t Au t , 

u i +1 = Bri -f b t ui, 

X t = I(_i + 

with 

(Cr f _i, r,_ i) u u t ) 

‘ r,_!) u,) * 

._ uj) _ q (Cri, Au,) 

1 (B-i*CAu lt u t ) ~~ P (B-^CAut, ut) * 

It is these processes that are called methods of conjugate directions. 

From formulas (105.4) and (105.10) we conclude that vectors r ( 
and s i+1 are linear combinations of vectors of the same power se¬ 
quence 

r 0 , ABr 0 , . . (^5)*r 0 . (105.13) 

Moreover, they are obtained from it using C- and C.45-pseudoorthog- 
onalizations respectively. This result has consequences of excep¬ 
tional importance. 

If not all the Components are present in the expansion of the vector 
r 0 with respect to the Jordan canonical basis of the operator AB, then 
the vanishing of the discrepancy occurs earlier than at the nth step. 
The process terminates especially rapidly if AB has a simple struc¬ 
ture and a large number of coinciding eigenvalues. That is, if in 
the expansion of r 0 with respect to the eigenvectors of the matrix of 
AB the nonzero components correspond to m mutually distinct 
eigenvalues, then r m = 0. 

By Theorem 104.1 trinomial relations of the type (104.5) must 
hold for vectors s , and r,. They can be obtained directly from (105.4) 
and (105.10). Namely, 

s (+1 = a t ABst -f- (1 4 - b t ) s t — &1-1S1-1. I > 1 * 

= t>l. (10o l4) 

From these we can obtain other relations, for example this: 

x i+i = ^i-i + ^i+i ( a i^ r t A- Xt — xt. i), 

^here <o, +1 and a ( are suitably chosen numbers. 

In view of what has been said concerning sequence (105.13) note 
the following peculiarity of condition (105.9). On the face of it this 
condition differs from those of the type (101.7). But if we take into 
account (101.6), then it is easy to show that (105.9) is in fact also 
a condition of the type (101.7), in relation to two scalar products 




105] 


Methods of conjugate directions 


371 


(CABx, y) and ( Cx, y) for that matter. Indeed, observe that the 
adjoint operator in (105.9) is connected with the basic scalar product 
of a unitary space, whereas the orthogonality of vectors s t and T| 
is ensured in relation to the scalar products {CABx, y) and {Cx, y) 
respectively 

cab{AB)* = {CAB-AB-{CAB)- 1 )* = ( CABC- 1 )* = a E+$AB, 
C {AB)* = {CABC' 1 )* = aE + §AB. 

The implementation of methods of conjugate directions can be 
prevented only by the vanishing of one of the scalar products, 
{CAB$i, s t ) or (Cr,_j, r,_,), before the discrepancy vanishes. If 
{CABs t , s t ) = 0, then the coefficients a, and b t cannot be computed. 
If, however, {Cr l _ 1 , r^j) = 0, then this leads to a zero coefficient a ( , 
to the coincidence of the nonzero discrepancies r ( _, and r ( and hence 
to the equation {CABs l+1 , S| +1 ) = 0 holding. Such a situation can 
be avoided by choosing a new initial vector x 0 . If the operators CAB 
and C are positive definite, then the above degenerations are impos¬ 
sible and the computational process runs without complications. If 
CAB is positive definite, then the methods of conjugate directions 
acquire further interesting properties. 

The closeness of a vector z to the solution of (105.1) can be judged 
by the smallness of the square of some norm of the difference e = 
— x — z. To that end it is convenient to use the so-called generalized 
error functionals of the form {Re, e), where R is any positive definite 
operator, for example B- V *CA . TTiat operator is positive definite, 
since it is connected with the operator CAB by the relation 5 _1 *CM = 
= B- 1 * {CAB) B-\ We have 

Theorem 105.1. If an operator CAB is positive definite, then among 
the totality of vectors of the form z = x 0 + Bs, where s is in the span 
of vectors s x , ..., s,, a vector x t gives a minimum of the generalized 
error functional 

q> (z) = {B~ im CAe, e). 

Proof. Since CAB is positive definite, the system of vectors s t is 
Ci45-orthogonal. We represent a vector z as a decomposition similar 
to (105.2) for a vector x: 

I 

z^Xo + BYj hjSj. 

We have 

<p {z) = {B-**CA{x-z), x—z) 

=(B- i *CAB (2 a,sj— 2 hjS } ), 5(2 a,s,— 2 hjS,)) 

}=i ;=! ;=l J=) 
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n i n { 

= (CAB('£ ajSj—y hjSj), 2 a J s J— IjhjSf) 
j=i ;=l j=l 

i n 

= 2 \ a i~h)\ 2 (CABsj, Sj)+ 2 \aj\ 2 (CABs } , s,). 

j—i j=i+i 

From this we conclude that a minimum of the error functional is 
attained for hj = a } , j ^ t, i.e. for z = x,. 

The error functional cannot be determined in practical calcula¬ 
tions, since it depends on the solution x, which is unknown. However, 
it differs only in the presence of a constant term from another func¬ 
tional: 

if (z) = ( B~'*CAz , z) — 2Re (B -1 *Cb, z), 

which can be calculated. Indeed, 

<p ( 2 ) = ( B -l *CA (x - z), x - z) = (B -1 *CAx, x) 

- ( B-'*CAx , z) - (B -1 *CAz, x) + (B^*CAz, z) 
= (B -1 *CAz, z) - (B -1 *Cb, z) — ( B -1 *Cb , z) 

+ (B -1 *Cb, x) (z) + (B~ 1 *Cb, x). 

Finally we note some classes of operators for which condition 
(105.9) holds. 

). 1. All operators A, B and C are Hermitian, with B = C. Condi¬ 
tion (105.9) holds for a — 0 and (5 = 1: 

(I CABC “»)• = (BA)* = A*B* = 0.£ + 1 -AB. 

2. Operators CAB and C are Hermitian too. Condition (105.9) 
again holds for a = 0 and p = 1: 

(CABC -1 )* = C -1 * (CAB)* = C -l CAB = 0 >E + 1 -AB. 

3. An operator C is commutative with AB, and AB is normal and 
its spectrum lies on a straight line. The last condition implies that 
AB = yE + bff for some Hermitian operator If. We now find 

(CABC -1 )* = (CC -1 AB)* = (AB)* = (yE + 6ff)* 

= yE -6ff= yb ~ Zy £^|(v£ + 6ff) = 21m 0 h ’ 0) ff + |-Aff. 

4. Represent an operator A as A = M + A^, where M = M* 
and N = —N*. If M is a nonsingular operator, then put B — C = 
= A/' 1 . Condition (105 9) holds for a = 2 and p = — 1: 

(CABC -1 )* = (M -1 (M + N))* = (M - N) M -1 

= 2 E - (M + N) M -1 = 2 -E — l-AB. 
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5. If in the decomposition A = M + N it is N that is a nonsingu- 
lar operator, put B = C = N -1 . Condition (105.9) holds for a = 2 
and (3 = —1: 

(CABC- 1 )* = (N- 1 (M + N))* = —{M - N) N' 1 

= 2 E — (M + N) N' 1 = 2 -E — i -AB. 


Exercises 


1. Prove that the matrix of a bilinear form ( CABx , y) is: 
right triangulnr in a basis s ,, . ... s n , 
right almost triangular in a basis r 0 .r n _ lt 

left triangular in basest, . . s„ and r 0 .r n _, if C is a Hermitian opera¬ 

tor. 

2. How does the form of the matrix of the bilinear form (CABx, y) of Exer¬ 
cise 1 change if CAB is a Hermitian operator? 

3. Prove thnt the matrix of a bilinear form ( Cx , y) is: 

right triangular in a basis r 0 , . . r„_,, 

right triangular in bases r 0 , . . r n _! and » lt . . ., s n , 

right triangular in bases . ABs n and s 1( . . ., s n , 

right almost triangular in bases i4Sr 0 , . . ., ABr n .y and r 0 , . . ., r n 

4. How does the form of the matrix of the bilinear form (Cx, y) of Exercise 3 
change if <7 is a Hermitian operator? 

5. Prove that if condition (105.9) is replaced by the following: 

(CABC- 1 )* = a t E + a^AB + .. . + a p (AB)p, p > 1, 
then relation (105.10) will be like this: 

s i+i = r i + Vi + + • • • + &/-p+i s i-p+i' 

6. Prove that 

_ _ (Cr,,!, rf_i) 

a ‘ (CABsi, s t ) * 

7. Prove that if CAB and C are Hermitian operators then 


bi 


( Cr,, r t ) 
(C r i-i> r i-i) 


8. Prove that if CAB and C are Hermitian and positive definite operators 
then a t < 0 and 6, > 0 for every l. 

9. Prove that the matrix of an operator AB in a basis made up of vectors 

s n or r 0 , . . ., r„_, has a tridiagonal form. 

10. How do the methods of conjugate directions run if A is a singular opera¬ 
tor? 


106. Main variants 

We discuss the best known variants of the 
methods of conjugate directions. Theoretically they all fit into 
scheme (105.12) described above. Practical calculations, however, 
are sometimes carried out using somewhat different algorithms. 

Method of conjugate gradients. In this method A is a Hermitian 
positive definite operator, B = C = E and (105.9) holds for a = 0 
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and p = 1. The positive definiteness of the operators CAB = A and 
C = E guarantees that the computational process runs without 
degeneration. At each step of the method the error functional with 
matrix A is minimized. The computational scheme of the method 
has the form 

*i = r o> 

r i — rj-j + 

s i+i = n + bjSt, 


where 

a ( r /-i» r i-\) 

‘ (-4*1. n-i) 


x t = x,_, + a t s h 


(rt- 1 , s t ) 
(-4*1, *,) 


< 0 , 


b t = 


(ri, -4s() _ (r t , r t ) q 

(-4*1, *i) (r|-i. rj-i) 


In the method of conjugate gradients vectors ri form an orthogonal 
system, and vectors s ( form an A-orthogonal system. 

Method of AA*-minimum iterations. In this method A is an 
arbitrary nonsingular operator, B = A*, C = E and (105.9) holds 
for a = 0 and p = 1. The positive definiteness of CAB = AA* and 
C = E guarantees that the computational process runs without 
degeneration. At each step of the method the error functional with 
matrix E, i.e. the square of the Euclidean norm of the error, is min¬ 
imized. The computational scheme here has the form 


u t = A*r 0 , 
r (= /•(_! +a t Au t , 
u i+i = -4Vi + b t u t , 
X, — X|_t -)- fl|U|» 

where 


a, 


(ri-i, ri_ t ) _ (ri-i. ri.Q , _ (r t , Au t ) _ ( r h r,) Q 

(i4u ( , r|_,) (“1, “l) ’ 1 (“!,“() ( r l-i. H-i) 


In the method of AA*-minimum iterations vectors r t and u t form 
orthogonal systems. 

Method of AM-minimum iterations. In this method A is an ar¬ 
bitrary nonsingular operator, B = A*, C = AA* and (105.9) holds 
for a = 0 and p = 1. The positive definiteness of CAB = {AA*) 2 
and C = A A* guarantees that the computational process runs with¬ 
out degeneration. At each step of the method the error functional 
with matrix A*A, i.e. the square of the Euclidean norm of the 
discrepancy vector, is minimized. The computational scheme of the 
method has the form 

u, = A%, 

T( ~ Ti_j 4- 0.[ Alli , 


u t+t — A*r t -f- 6|U|, 


Xj — Xj _| - a,Uj, 
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•where 


a,= 


[■4»r,_|, 

(Ain, “4“/) ’ 


bt 


(A*rj, A*rj) n 


In the method of AM-minimum iterations vectors A*r t and AU( 
form orthogonal systems. 

Method of complete Hermitian decomposition. In this method A 
is an arbitrary nonsingular operator. We represent it as a sum A = 
= M + N, where M = M* and N = — N*. If either M or N is 
nonsingular, we set B = C = A/" 1 or B = C = W _1 respectively. 
Condition (105.9) holds for a = 2 and (3 = —1. If M (or iN) is an 
operator of constant signs, the process runs without degeneration. 
For example, let M > 0 and B = C = A/' 1 . The operator C will 
be positive definite and therefore (Cz, z) = (A/ _I z, z) > 0 for any 
nonzero vector z. Consider now the operator CAB = A/ -1 + 

For any z =t= 0 

( CABz , z) = (A/ _1 z, z) + z) ^ 0, 

since the first scalar product at the right is real and, in view of the 
positive definiteness of the operator, positive and the second scalar 
product is pure imaginary in view of the fact that the operator 
A/ _1 WA/ _1 is skew-Hermitian. For the case B — C = A/ -1 the com¬ 
putational scheme of the method has the form 

Mu x = r 0 , 

r i — r i-i + o-iAui, 

Mv t = r h 
u i+i ~ v t + b t u t , 

Xi = £(-1 + 

where 

(ri-!, u t ) , _ (v t , A U j) 

1 (Aui, uj) ’ 1 (Au t , u t ) * 

If B = C = N~ l , then the computational scheme and the formulas 
for the coefficients a t and b t remain the same, except that M is re¬ 
placed by N of course. 

Method of incomplete Hermitian decomposition. In this method A 
is a Hermitian positive definite operator. We represent it as a sum 
A = M + N, where M = M* and N = N*. If M is nonsingular, 
then we set B = C = A/ -1 . Condition (105.9) holds for a = 0 and 
P = 1. If M is positive definite, the process runs without degenera¬ 
tion. At each step of the method the error functional with matrix A 
is minimized. The computational scheme remains the same as in 
the case of the complete decomposition method. 
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Speeding up the computational process. As we have already noted, 
the methods of conjugate directions allow a solution to be found 
especially rapidly if the operator AB has few mutually distinct 
eigenvalues. Use is made of this fact to construct various devices 
for speeding up the solution of equation (105.1), which are based 
on the following idea. 

Suppose the operator A can be represented as a sum A = M + N, 
where the operator M determines the “principal” part of A and 
allows a simple solution of equations of the type (105.1) with M 
at the left. Now instead of (105.1) we shall solve the equation 

(.E + NM~ l )y = b, (106.1) 

where Mx = y. If in some reasonable sense the operator M is close 
to A, then most eigenvalues of N and hence of the operator NM~ l 
are close to zero or zero. Applying the methods of conjugate direc¬ 
tions to equation (106.1) in this case results in finding rapidly a 
solution. 

Observe that it is this idea that underlies the method of incomplete 
Hermitian decomposition, which in many cases proves to be more 
efficient than the classical variant of the method of conjugate gra¬ 
dients. It all depends on how lucky one is in decomposing the op¬ 
erator A. 

We shall not discuss in detail the computational schemes of speed¬ 
ing up processes, since they rely too heavily on the use of particular 
features of the operator A. 

Exercises 

1. Under what conditions is it appropriate to apply 
one variant or another of the method of conjugate directions? 

2. How many iterations are required to implement the different variants of 
the methods of conjugate directions for an operator A of the form E + R, where 
the operator R is of rank r? 

3. Assuming an operator A to be a matrix evaluate the number of arithmeti¬ 
cal operations required for solving a system of linear algebraic equations by 
the methods of conjugate directions. 

4. The matrix of an operator A is Hermitian and differs from a tridiagonal 
matrix in a small number of its elements. Which of the variants of the methods 
of conjugate directions is it appropriate to apply in this case? 

5. Let P „ (<), P i (<), . . . oe some sequence of polynomials. Choose a vec¬ 
tor x 0 and construct a sequence of vectors x 0 , . . . by the rule 

x h+1 = x k ~ BP h i AB ) i Ax h ~ b), k> 0. (106.2) 

How do the expansions of the discrepancies r 0 , r,, . . . with respect to the Jordan 
canonical basis of the operator A B cnange as k increases depending on the choice 
of a sequence of polynomials? 

6. How is sequence (106.2) to be used for the purpose of constructing an 
initial vector for the methods of conjugate directions that ensures that a solu¬ 
tion is obtained in a smaller number of iterations? 

7. Which of the systems of vectors in each of the particular variants of the 
methods of conjugate directions are up to normalization A-pseudodual? 
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107. Operator equations 
and pseudoduality 

The methods of conjugate directions are not 
the only methods of solving the operator equation 

Ax = b (107.1) 

that rely on the use of bilinear forms. Vast possibilities for creating 
methods arise from constructing systems of vectors dual or pseudo¬ 
dual to some bilinear form related to the operator A of equation 
(107.1). 

We again assume that A is a nonsingular operator in a unitary 
space K n . Consider a bilinear form ( Ax , y) and suppose that systems 
of vectors u lt u 2 , . . ., u n and v lt v 2 , . . ., v„, k-pseudodual up 
to normalization, have been obtained for it in some way, i.e. that 

( Au t , v t ) # 0, (Au t , v h ) = 0, k < i (107.2) 

for every i ^ n. We show that knowing A -pseudodual systems of 
vectors makes it possible to construct the process of finding a solu¬ 
tion of equation (107.1). 

We choose vector x 0 . Since A -pseudodual systems are linearly 
independent, there is a decomposition 

n 

x=- x 0 + y ajUj. (107.3) 

j=i 

If 

i 

x l = x 0 + 2 a J u J< 

;=1 


then similarly to (105.3) and (105.4) we have 

x i = x i-i + a,u h rj = r ( _! + a t Au t . (107.4) 


Further 

n 

r t = Ax, — b= A(x, — x )=— 2 ajAuj, 

i =<+1 

and according to the second conditions in (107.2) we find that 


n 

(n> V h ) = — 2 aj (^lu,. v h ) = 0 

;=i+i 


for every k ^ i. So 

C rt , v h ) = 0 (107.5) 

for k ^ i. This makes it possible to determine coefficients a t from 
(107.4). That is, 

(H-1, t>t) 

{Au t , v t ) * 


a, = 


(107.6) 
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According to the first conditions in (107.2) the denominator on the 
right of (107.6) is nonzero. 

It follows from (107.5) that the vector r„ is left orthogonal, and 
by the symmetry of a scalar product simply orthogonal, to the lin¬ 
early independent vectors i> lt v 2 , . . ., i>„, i.e. that r„ = 0 and 
the vector x n is a solution of equation (107.1). 

The above methods of solving equation (107.1) are known as 
methods of dual directions. The number of different methods is 
infinite in the full sense of the word since there are an infinite number 
of different A-pseudodual pairs of systems of vectors. The methods 
of conjugate directions considered earlier obviously belong to this 
group. 

In general there is no analogue of Theorem 105.1 for methods of 
dual directions, even for a positive definite operator A. The beha¬ 
viour of errors e h = x — x h in these methods is described only by 
the weak, and yet useful, 

Theorem 107.1. Let P h be an operator of projection onto the subspace 
spanned by vectors u lt .... u h parallel to the subspace spanned by 
vectors u fc+1 , . . ., u n . Then 

e h = (E- P h ) e 0 . (107.7) 

Proof. By formula (107.3) we have the following expansion for 
the error e 0 of an initial vector x 0 : 

n 

e 0 — x— Xq = ajUj , 

But by the definition of the projection operator 


h 



The right-hand side of this equation is nothing but x h — x 0 . There¬ 
fore 

P h e 0 = x k — X 0 = (x — Xq) — (x — Xft) = C 0 — e h% 
which proves the theorem. 

Interesting results associated with A-pseudodual systems can be 
obtained by considering the matrix interpretation of the above 
methods. 

We assume that the space K„ is not only unitary but also arith¬ 
metical, which is admissible by virtue of the isomorphism of finite 
dimensional vector spaces. The entire reasoning above remains valid, 
only the terminology is changed: equation (107.1) becomes a system 
of linear algebraic equations, the operators are replaced by matrices, 
and by vectors column vectors are meant. Denote by U (V) a matrix 
whose columns are vectors u lt . . ., u„ (i^, . . ., v n ). Then the fact 
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that those vectors satisfy relations (107.2) implies that the matrix 

C = V*A U 

is nonsingular left triangular. From this we obtain the following 
factorization of the matrix A : 

A = V- l *CU~\ (107.8) 

So a knowledge of A -pseudodual, up to normalization, systems 
of vectors allows us to solve the system of linear algebraic equations 
(107.1) with error estimates (107.7) and to obtain factorization 
(107.8) of the matrix A into factors among which there is one triangu¬ 
lar factor. We show that the converse is also true. That is, any meth¬ 
od of solving systems of linear algebraic equations, based on factor¬ 
ing a matrix into factors among which there is at least one that is 
triangular, determines some A -pseudodual, up to normalization, 
systems of vectors. Hence implementing such methods according 
to schemes (107.3) to (107.6), it is possible to use estimates (107.7). 

Consider a matrix P obtained from a unit matrix by reversing 
its columns or equivalently its rows. It is easy to verify that multi¬ 
plying an arbitrary matrix C on the right by P reverses the order 
of the columns of C and multiplying a matrix CP on the left by P 
reverses the rows of CP. The elements f l} of the matrix F = PCP are 
therefore connected with the elements c t] of C by the relation 

flj = c o-|+l, n-J+l’ 

A number of useful consequences follow. Let us label the diagonals 
of a matrix parallel to the principal matrix upwards in succession by 
the numbers — (n — 1), — (n — 2),. . ., 0, . . ., (n — 2), (n — 1). 
The diagonal with the zero index is the principal diagonal. In such 
an indexing the elements of the fcth diagonal are defined by the rela¬ 
tion j — t = k. If the matrix C satisfies the conditions 

Ci] = 0, k <; — i, ] — i <1 

for some numbers t ^ k, then for the matrix F = PCP 

lit = 0. —/</ — *, ) — i < —k. 

Hence under the transformation F = PCP a diagonal matrix re¬ 
mains diagonal, a right (left) triangular matrix becomes left (right) 
triangular, a right (left) bidiagonal matrix becomes left (right) 
bidiagonal, and so on. 

Suppose now that use is being made of some method of solving 
a system of linear algebraic equations (107.1) based on a preliminary 
factorization of a matrix A: 

A = QCR, 


(107.9) 
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where C is a triangular matrix. It may be assumed without essential 
loss of generality that C is left triangular, since otherwise instead of 
(107.9) we should consider the factorization 

A = (< QP) ( PCP) (PR), 

where according to the foregoing the matrix .PCP must be left trian¬ 
gular. The desired matrices U and V defining .4-pseudodual, up to 
normalization, systems of vectors u lt . . ., u n and v lt . . ., v n can 
be given by the equations 

U = if" 1 , V = <?-»*. 

Observe that in factorization (107.9) generated by some numerical 
method the matrices Q and R are as a rule sufficiently simple. Most 
often they are unitary or triangular matrices as well as matrices 
differing from triangular matrices in having permuted rows and 
columns. Matrices if -1 and Q~ y * can therefore be found without 
particular difficulties. At all events the total computational costs 
of finding them are much lower than those of obtaining factoriza¬ 
tion (107.9). This is characteristic of such widely known methods 
as the Gauss methods, the square root method, the Jordan method, 
the orthogonalization method, the method of reflections, the method 
of revolutions; methods based on reducing a system to a bidiagonal 
form and on obtaining normed decompositions; methods of conjugate 
directions, and so on. 

Thus most of the existing numerical methods of solving operator 
equations (107.1) in a finite dimensional space are in fact methods of 
constructing A-pseudodual systems of vector. Despite the diversity 
of their specific modes all these methods can be analyzed from a gen¬ 
eral standpoint based on Theorem 107.1. 


Exercises 


1. Assume an operator to be a matrix and vectors of 
a space to be column vectors. Prove that in the methods of dual directions 
successive errors are related by 


where] 


e h ~ ( E — S h ) e h-l> 

“ ft"*- 4 

k ~ »t Au k ’ 


( 107 . 10 ) 


2. Prove that operators S h satisfy the equations 

5^ = 5);, iS,-Sft = 0, 

S( (E — Sf ,) (E — "Sfe —i) • . . (E — Sj) = 0, i < k. 

3. Prove that the operators S k of (107.10) and the operator P h of (107.7) 
are related by 

P h = (E - S h ) [E - S*.,) . . . (E - S,). 
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4. What role do operators S h and P h play in the particular methods defined 
by factorization (107.9)? 

5. How do errors e h change in the particular methods defined by factoriza¬ 
tion (107.9)? 

6. Which of the known methods of solving systems of linear algebraic equa¬ 
tions is not hased on factorization (107.9)? 

108. Bilinear forms 

in spectral problems 

Bilinear forms are widely used not only in 
solving operator equations but also in many other problems of linear 
algebra, in particular for determining the eigenvalues of an operator. 
We now discuss two methods of finding the eigenvalues of a Hermi- 
tian operator in unitary space. 

Let A be a Hermitian operator. It certainly satisfies condition 
(101.7). We choose a vector x, construct a power sequence and sub¬ 
ject it to pseudoorthogonalization process. Since the scalar product 
is Hermitian-symmetric, the resulting sequence of vectors / lt / 2 , . . . 
is orthogonal. By (104.5) 

A = 

/ 2 = Af 1 — a ifu 

. (108.1) 

A+i = Aft — a ( A — Pi-i/i-i. i > 1. 


formulas (104.6) assuming the following form by virtue of the orthog¬ 
onality of the system of vectors A* / 2 . . • ., /„: 


M/ 11 /() Q M/lt /1—1 

(ft,fi) ’ p,_1 (/1-1, A-i) 


The self-adjointness of A allows us to conclude that all coefficients 
a 1 and P ( are real and that the coefficients (3 £ _, are in addition posi¬ 
tive. Indeed, 

(Aft, ft) = (A. A*f t ) = (/„ Af t ) = JAUTU), 


whence it follows that a ( are real. Further (108.1) yields 

(Aft, ft-i) = (ft, Aft-j) = (/ ( , /( 0Cj_iA-i — ^i-zft-z)—(f 1 , ft)- 
Therefore 


and the positiveness of p £ _! follows from the positiveness in unitary 
space of scalar products of nonzero equal vectors. 
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As already said, the matrix of A in the basis made up of vectors 
/i, /21 • • /„ has a tridiagonal form. If in finding a basis f lt / 2 , . . . 

there were early terminations of the process of orthogonaliza- 
tion of the power sequence, then the matrix splits into a number of 
tridiagonal matrices of smaller size. 

The positiveness of {3/ allows the form of tridiagonal matrix (104.8) 
to be simplified. Take a nonsingular diagonal matrix D with ele¬ 
ments ai and perform a similarity transformation with matrix (104.8). 
In the matrix D^AfD the diagonal elements will remain the same 
as in the matrix Af, the elements pfar^xj+j will take the place of 
the off-diagonal elements {3 it and the elements symmetric with 
respect to them will equal ovarii- If we take 

a. = l, a l+( = ^, *>1, (108.2) 

then the matrix D is real and the matrix D~ l A f D is real symmetric 
tridiagonal. Since Pj are positive, by (108.2) the diagonal elements 
of D and hence the off-diagonal elements of D~ l A f D are also positive. 

Thus the problem of determining the eigenvalues and eigenvectors 
of any Hermitian operator can always be reduced to that of determin¬ 
ing the eigenvalues and eigenvectors of a real symmetric tridiagonal 
matrix A with nonzero off-diagonal elements: 


a f 

Yi 

Yi 

«2 

72 

0 


72 

<*3 

7s 


0 

Yn-2 

«n-l Yn-l 



Yn-1 «n 


Such matrices are called Jacobian matrices. 

One of the most efficient numerical methods of finding the eigen¬ 
values of a Jacobian matrix is based on the law of inertia for quadrat¬ 
ic forms. But before proceeding to describe it we consider some 
properties of Jacobian matrices. 

Theorem 108.1. All eigenvalues of a Jacobian matrix are simple. 

Proof. Suppose the eigenvalue X is multiple. Then the rank of the 
matrix A — XE must not be greater than n — 2. But it is clearly 
not less than n — 1 , since the off-diagonal elements are nonzero 
and hence so is the minor in the first n — 1 columns and the last 
n — 1 rows. The contradiction obtained proves the theorem. 

Corollary. If a symmetric tridiagonal matrix has an eigenvalue of 
multiplicity p, then there are at least p — 1 elements equal to zero 
among its upper off-diagonal elements. 

If A is a Jacobian matrix, then the matrix A — XE remains 
Jacobian for all real X. Denote by <jj (X), . . ., ct„ (A,) the principal 
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minors of A — \E. It is clear that for all r, a r (X) is a polynomial of 
degree r coinciding up to a sign with the characteristic polynomial 
of the matrix of the minor of order r for A. Since A is a symmetric 
Jacobian matrix, all the roots of polynomials a r (X) are real and 
simple. Expanding a minor a r (X) by the last row or the last column 
we obtain the following recurrence relations: 

*^i(^) == ®i Xj, 

cr r (X) = (a r - X) a r . t (X) - f r . ,a r _ 2 (X), (108.4) 

2<r<n. 

Theorem 108.2. For all r > 1 no polynomials a r (X) and a r _ 1 (X) 
have common roots. 

Proof. Suppose for some r the number X is a common root of poly¬ 
nomials o r (X) and a r . x (X). Then it follows from (108.4) that X is 
a root of a r _ 2 (X), since y p _ x 0. Proceeding with this reasoning we 
arrive at the conclusion that X is a root of ct 0 (X). But a 0 (X) has no 
roots, and therefore no adjacent polynomials have common roots. 

Corollary. If X is a root of a polynomial a r (X), then a r _ x (X) and 
o r+1 (X) are nonzero and have opposite signs. 

Let X be a root of none of the polynomials ct x (X), .... a n (X). 
Calculate for that X the values of all polynomials from formulas 
(108.4) and consider the alternation of signs in the sequence 

Oo (X), CTi (X), <x 2 (X), . . ., ct„ (X). (108.5) 

Taking into account the law of inertia and the connection between 
the principal minors of a matrix, the coefficients of the canonical 
form and the eigenvalues we can say that the number n_ (X) of 
sign alternations in sequence (108.5) equals the number of the eigen¬ 
values of the matrix A strictly less than X. 

The presence of zero terms in (108.5) does not lead to any difficul¬ 
ties. Indeed, choosing e sufficiently small, we can make all poly¬ 
nomials a r (X— e) nonzero, while maintaining the signs of the non¬ 
zero minors of (108.5). According to the corollary of Theorem 108.2 
the signs of the minors that were zero do not affect the general 
number of sign alternations in (108.5). Therefore all zero terms of 
sequence (108.5), except a„ (X), may be assigned arbitrary signs. 
If, however, a„ (X) = 0, then X is an eigenvalue of A. The number 
of sign alternations w_ (X) in sequence (108.5) without o„ (X) is 
again equal to the number of the eigenvalues of A strictly less than X. 

Suppose the eigenvalues X of A are indexed in algebraically decreas¬ 
ing order, i.e. 

X x > X 2 > . . . > X„. (108.6) 

We show how to determine the Ath eigenvalue X fc . Let numbers a 0 
and b 0 be known such that 

b 0 > a 0 , n. (a 0 ) < k, n _ (& 0 ) > k. 
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Then X h is clearly in the half-interval [a 0 , b 0 ). Observe that we may 
take any number less than —\\A || as a 0 and any number greater 
than || A || as b 0 . Now set 

c o = y ( fl o+ \) 

and determine n_ (c 0 ). If n_ ( c 0 ) < k, then X h is in the half-interval 
[c„, b 0 ). If n _ (c 0 ) ;> k, then X* is in [a 0 , c 0 ). Therefore we can always 
find a half-interval, half the previous one, containing X h . Proceeding 
with this process we obtain a system of embedded half-intervals 
[a,, b,) containing X h , with 


(b a — a ,) = 2-‘(b 0 — a 0 ). 


This allows the eigenvalue X* to be localized to any required preci¬ 
sion. 

The described method of determining the eigenvalues of a tri- 
diagonal matrix is called the method of bisections and sequence (108.5) 
is called the Sturm sequence. Once any eigenvalue X has been deter¬ 
mined, the eigenvectors relating to it are determined as solutions of 
a homogeneous system of linear algebraic equations with a matrix 
A-XE. 

Another method of finding the eigenvalues of a Hermitian opera¬ 
tor A is based on some extreme properties of the related quadratic 
form (Ax, x). 

It is again assumed that the eigenvalues of A are arranged accord¬ 
ing to (108.6). Suppose that for the eigenvalues X,,, X,,, . . ., X tf 
orthogonal eigenvectors x,,, x, f , . . ., x lr are known. Denote by L r 
the span of those eigenvectors and let X ri be the largest of the eigen¬ 
values of A. 

Theorem 108.3. 




max 
x^fcO 
x±L, 


(Ax, x) 

(x, x) ’ 


(108.7) 


and any vector for which a maximum is attained u ill be an eigenvector 
corresponding to X . 

Proof. The quotient on the right of (108.7) will remain unchanged 
if the vector x is multiplied by any nonzero number. Instead of 
(108.7) we may therefore investigate the following equation: 


X r , = max (Ax, x). (108.8) 

(*, x)=l 
xXL r 

Choose in K„ an orthogonal basis made up of eigenvectors of the 
operator A, including the vectors x,,, . . ., x, r . Then in this basis 
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equation (108.8) assumes the following^ form: 


n 

X r , = max 2 
|a, !•+... +|a„|2=l;=l 

“i,- a i r =° 


M^l 2 , 


(108.9) 


where a^, . . ., a n are the coordinates of a vector x in the expansion 
with respect to the chosen basis. 

The unitary space K n is complete, the set of vectors satisfying 
the condition (x, x) = 1 is bounded, and therefore there are vectors 

A A 

on which the maximum in (108.8) is attained. Let a,, . . ., a n be the 
coordinates of one of such vectors. We show that if X f X ri , then 

A A 

a.) = 0. Indeed, if > A. ri the equation a) = 0 follows from the 
condition x _L L r , since eigenvalues greater than A. r , may exist only 

A 

among A.*,, . . ., A,j r . If Xj < A ri , then the coordinate aj cannot be 
nonzero, since otherwise we could obtain a larger value of the sum 

in (108.9) by taking | a r , |* + I oty | l and 0 instead of the numbers 
| a r , |* and | aj |*. 

Thus the largest value of the sum in (108.9) may be attained only 

A A A 

for those 0 ^, . . ., a„ for which 0 only when X } = X r> . Hence 
the maximum value of the quotient in (108.7) is attained on eigen¬ 
vectors corresponding to X ri . Of course equation (108.7) follows too. 

This theorem shows a way of constructing a numerical method of 
finding the eigenvalues and the eigenvectors of the operator A, 
which is based on seeking the maxima of the function 


0(x) = 


O 4 *, j) 
(*, x) 


called the Rayleigh quotient. We shall restrict ourselves to a brief 
discussion of the method. 

Take a vector x and consider the behaviour of the Rayleigh quo¬ 
tient in the neighbourhood of that vector. Simple transformations 
show that for any vector l and a small e the following representation 
holds: 


6 (x -f tl) = 0 ( x) + 7 - 7 —— Re (Ax — 0 (x) x, l) + O (e*). 

V z » x ) 

(108.10) 

If Re (Ax — 0 (x) x, l) 0, then for sufficiently small real e whose 
sign coincides with that of the real part of the scalar product we get 
0 (x -(- > 0 ( 2 ), the inequality being opposite for small e of the 

opposite sign. 
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Suppose the vector x is not an eigenvector. Then Ax — 6 (i) i # 

0 and we may set 

l = Ax — 0 (x) x. (108.11) 

The scalar product under the sign of the real part in (108.10) is 
positive and we find from (108.11) for l that 

0(*+eZ) = 0(*) + ^(l. 0 + 0 (e*). 

We can now choose for a small positive e a vector y — x + el such 
that 0 ([/)>■ 0 (x). 

Proceeding in a similar way we obtain, under some additional 
conditions not to be discussed here, a maximum of the Rayleigh 
quotient and hence a maximum eigenvalue and the corresponding 
eigenvector. If at some step it is found that the vector l of (108.11) 
is zero, then this means that x is an eigenvector and there corresponds 
an eigenvalue to it equaling 0 ( x). In this case, it is not possible at 
all to guarantee the increase of the Rayleigh quotient relying only 
on formula (108.10). For this reason the eigenvectors of the operator 
A are called stationary points of the Rayleigh quotient in the terminol¬ 
ogy of mathematical analysis. 

If one or several eigenvectors are known, then in order to determine 
another eigenvector we shall again seek the maximum of the Ray¬ 
leigh quotient, yet not in the entire space but only in the orthogonal 
complement of the subspace spanned by the previously foundjeigen- 
vectors. 


Exercises 


1. Prove that each root of the polynomial a r . x (X) of 
(108.4) lies between two adjacent roots of the polynomial a, (X). 

2. Suppose a tridiagonal symmetric matrix has several zero oS-diagonal 
elements. Does this mean that the matrix has multiple eigenvalues? 

3. Prove that, with the coefficient a„ of matrix (108.3) increasing (decreas¬ 
ing) without limit, all the eigenvalues except the maximum (minimum) one 
remain bounded. 

4. Prove that, with the absolute value of the coefficient ()„_! of (108.3) 
increasing without limit, all the eigenvalues except the maximum and the 
minimum one remain bounded. 

5. A Jacobian matrix of size 2n + 1 with elements <xj = n — t + 1 and 
flj = 1 has no multiple eigenvalues. Prove that its maximum eigenvalue diffen 
from the one closest to it by an amount of an order of (nl)~*. 

8. Prove that 


K 


= max 

x?fc0 


{Ax, x) 
(x, x) 


X n = min 

x^fcO 


(Ax, x) 
(x, x) 


7. 


What does the quantity 


xxtr 


(Ax, x) 

(X, x) 


mean in the notation of Theorem 108.3? 




Conclusion 


This textbook has provided ample enough 
material for the reader to comprehend both the theoretical basis of 
linear algebra and the numerical methods it employs. However, 
because of the peculiarities of individual syllabuses and the limited 
lecture time allowed for this course some sections of this text may 
escape the reader’s attention. We shall therefore characterize briefly 
all the material presented. 

Linear algebra as a science studies sets of special structure and 
functions of them. In general similar problems face other areas of 
mathematics, for example mathematical analysis. A characteristic 
feature explicitly of linear algebra is that the sets are always finite 
dimensional vector spaces and the functions are linear operators. 

The general properties of vector spaces are discussed in Sections 10 
and 13 to 21, and the general properties of linear operators are treat¬ 
ed in Sections 56 to 61 and 63 to 74. The information presented in 
those sections can be obtained in various ways, including direct 
methods employing no concepts or tools but the most elementary. 
Yet one of the additional concepts does deserve comment. It is the 
determinant. 

As a numerical function given on systems of vectors, the determi¬ 
nant is a relatively simple object. Nevertheless it possesses many im¬ 
portant properties. They have made it a widely used tool significant¬ 
ly facilitating various studies. Besides, the determinant is very often 
employed in constructing numerical methods. All this has led us to 
give the concept of determinant sufficiently much attention by 
considering its geometrical and algebraic properties in Sections 34 
to 42 and 62. As a tool of study the determinant is used in this book 
to prove diverse assertions. 

Another numerical function of two independent vector variables, 
the scalar product, defines two major classes of vector spaces called 
Euclidean and unitary. The basic new concept in these spaces is 
that of orthogonality. Sections 27 to 33 discuss the properties of 
vector spaces due to the scalar product, and Sections 75 to 81 describe 
the properties of linear operators due to the scalar product. 

Systems of linear algebraic equations play an exceptionally im¬ 
portant role throughout mathematics, not only in linear algebra. 
Devoted to the study of the various aspects of them are Sections 22, 
45, 46 and 48. 

As a rule only the material listed constitutes the basis of a course 
in linear algebra, with a course in analytic geometry added as a 
separate unit. In the present text the necessary facts from analytic 
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geometry are given not in isolation but intermittently with the 
corresponding facts from linear algebra. Such a presentation of the 
material has allowed us to attain a number of advantages: to reduce 
many proofs of the same type in both courses, to emphasize the 
geometrical interpretation of abstract algebraic notions, such as 
vector space, plane in a vector space, determinant, systems of linear 
algebraic equations and so on. 

Linear algebra is to a great extent enriched in new facts if the 
concept of distance between vectors and that of limit of a sequence 
of vectors are introduced into vector spaces. The necessity of intro¬ 
ducing them is also dictated by the requirements of numerical 
methods. The metric properties of vector spaces are studied in Sec¬ 
tions 49 to 54, and those of linear operators are considered in Sec¬ 
tions 82 to 84. Of course all these facts are usually given in function¬ 
al analysis, but as a rule many results important for finite dimen¬ 
sional vector spaces remain unaccentuated. 

Numerical solution of problems in linear algebra is nearly always 
accompanied by the appearance of round-off errors. Therefore the 
computer to be must realize what changes in the properties of differ¬ 
ent objects of linear algebra result from small changes in vectors and 
operators. The effects of small perturbations are treated in Sections 
33, 87 and 89. 

Properties of many objects of linear algebra may be reversed even 
by arbitrarily small perturbations. Thus a linearly dependent system 
of vectors may become linearly independent or increase its rank, an 
operator with Jordan structure may become one of a simple struc¬ 
ture, a compatible system of linear algebraic equations may become 
incompatible and so on. All these facts give rise to exceptionally 
great difficulties in practical solutions of problems. 

It is imperative that the reader should once again peruse Section 22 
to study the example given there and to ponder on the questions 
posed at the end of the section. 

Despite the instability of many notions of linear algebra its prob¬ 
lems can be solved in a stable way. To demonstrate this the book 
includes a description of a stable method of solving systems of lin¬ 
ear algebraic equations. Its theoretical justification and general 
scheme are given in Sections 85, 86 and 88. 

The last part of the book is devoted to the description and investi¬ 
gation of various questions relating to bilinear and quadratic forms. 
These numerical functions play a very important role in linear al¬ 
gebra and are closely related to the construction of numerical meth¬ 
ods. Sections 90 to 94 deal with the general properties of bilinear 
forms and with the connection of their transformations with matrix 
decompositions, Sections 98 to 101 discuss the extended concept of 
orthogonality and Sections 103 to 108 consider uses of bilinear forms 
in computational processes. 
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Coordinate system, rectangular, 78 
spherical, 81 

Coordinate transformation matrix, 100 
Cramer’s rule, 159 
Cylinders, 331 


De Molvre’s formula, 213 
Determinant, 128 
Directed line segment, 21 
magnitude of, 34 
product of by a number, 35 
Direction, asymptotic, 316 
nonasymptotlc, 316 
Direction subspace, 145 
Direction vector, 139, 140 
Distance, between sets, 100 
between vectors, 100, 160 
Distributive taw, 30 
Dividing a segment In a given ratio, 84 


Eigenvector, 2o4 
Ellipse. 320 
Ellipsoid, 327 
Equal elements, 20 
Equation, Intercept form of, 138 
Equation or a plane, normed, 144 
Equation of a plane in space, general, 137 
Equation of a straight line, canonical 130 
In the plane, 139 
normed. 145 
parametric, 140, 149 
Equivalence relation, 19 
Error functional, generalized, 371 
Euclidean Isomorphism, 104 


Field, 31 

algebraically closed, 208 
Finite product, 42 
Finite sum, 40 
Fredholm alternative, 266 
Fredholm theorem, 266 
Frobenlus matrix, 208 
Function, continuous, 214 
Functional, of discrepancy, 267 
regularizing, 275 

Fundamental system of solutions, 157 
Fundamental theorem of algebra, 216 


Gauss method, 70, 135 
Gram determinant, 134 
expansion of with respect to a column, 131 
expansion of with respect to a row, 131 
Gram matrix, 334 
Gram tan, 134 
Group, 26 

commutative (Abelian), 20 
Identity or, 27 
operation, 27 


Hadamard’s Inequality, 117 
Half-space, closed, 155 
negative, 154 
nonnegative, 155 
nonpositive, 155 
open, 154 
positive, 154 
Holder’s Inequality, 165 
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Hyperbola, 322 
Hyperboloid, of one sheet, 328 
of two sheets, 329 
Hyperplane, 148 
diametrical, 317 


Identity element, 27 
Identity matrix, 190 
Identity operator, 178 
Inclined line to a subspace, 102 
Inertia, Index of, 310 
law or, 309 
Inversion, 124 


Method, of conjugate directions, 370 
of conjugate gradients, 373 
of Incomplete Hermltlsn decomposition, 
378 

Minimum polynomial, 363 
Minkowski space-time, 355 
Minkowski universe, 355 
Minkowski's Inequality, 1C6 
Minor, 129 

algebraic adjunct of, 130 
complementary, 129 
principal, 129 

Multiple root of a polynomial, 219 


Jacobi algorithm, 306 
Jordan canonical form, 234 


Kronccker-Capelll theorem, 157 


Neighbourhood, 161 
Norm, Euclidean, 169 
of operator, 257 
compatible, 259 
spectral, 2C0 
of vector, 167 


Lagrange's Interpolation polynomial, 219 

Laplace theorem, 130 

Leading element, 70, 135 

Limit of a sequence, 161 

Limit point, 161 

Line segment In vector space, 149 


Mntrlces, congruent, 291 
equal, 191 
equivalent, 201 
difference of, 192 
Hermltlan-congruent, 291 
product of, 193 
similar, 202 
sunt of, 191 

Matrix, adjoint, 237 
diagonal, 190 

Hermltlan (self-adjoint), 254 
Inverse of, 195 
Jacobian, 382 
left almost triangular, 308 
left trapezoidal, 301 
nonsingular, 195 
norm of, 264 
normal, 254 
of a system, 156 
augmented, 157 
orthogonal, 254 
positive definite, 293 
principal diagonal of, 128 
product of by a number, 192 
quasi-diagonal, 231 
rank of, 133 
rectangular, 132 
right almost triangular, 308 
right trapezoidal, 301 
scalar, 190 
singular, 195 
skew-Hcrmltlan, 292 
skew-symmetric, 291 
square, 127 
symmetric, 254 
trace of, 195 
transpose of, 128 
trldlagonal, 308 
unitary, 253 

Matrix norm, 264 
of ojierator, 262 
subordinate, 259 

Method, of zti4*-mlnlmum Iterations, 374 
of /l*j4-mlnlmum Iterations, 374 
of bisections, 384 

of complete Hermltlan decomposition, 378 


Operation, algebraic, 13 
associative algebraic, 14 
commutative algebraic, 14 
Inverse, 17 
left Inverse, 17 
right Inverse, 17 
Operator, 177 
adjoint, 235 

almost equal to an Identity operator, 276 
bounded, 256 

characteristic polynomial of, 208 
complex ideation of, 251 
condition number of, 272 
continuous, 256 
continuous at a point, 256 
domain of, 178 
eigenvalue of, 204 
eigenvector of, 204 
Hermltlan (self-adjoint), (243 
Hermltian decomposition of, 248 
image of, 178 
Induced, 223 

Invariant subspace of, 222 
inverse, 185 
isometric, 243 

Jordan canonical form of, 234 

kernel of. 179 

lert adjoint, 350 

linear, 178 

matrix of, 188 

nllpotent, 231 

nonnegative, 244 

nonsingular, 184 

norm of, 257 

normal, 240 

null space of, 179 

nullity of, 179 

of projection, 191 

of a simple structure, 205 

opposite, 178 

orthogonal, 250 

perturbation of, 272 

polar factorization of, 249 

positive definite, 244 

power of, 186 

product of by a number, 181 

proper subspacc of, 204 

pseudolnversc (generalized Inverse), 268 

range of, 178 

rank of, 178 

right adjoint, 350 

root basis of, 234 

root subspace of, 230 
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Operator, root vector of, 230 
scalar, 178 
singular, 134 
singular bases of, 247 
singular (principal) values of, 247 
symmetric, 250 
unitary, 242 

Operator equation. 194, 263 
0|K'ialor polynomial. 225 
Operators, commutative, 183 
cyclic group of, 1S7 
dinct sum of, 229 
Donsiiigular group of, 184 
product of, 182 
ring of, 183 
subtraction of, 181 
sum of, 180 

Orthogonal complement, 97 
left, 340 
right, 340 
Orthogonal sets, 96 
Orthogonallzatlon processes, 359 


Parabola, 325 
Paraboloid, elliptic, 330 
hyperbolic, 331 
Parallelogram law, 26 
Permutation, 124 
even, 124 
normal, 124 
odd. 124 

Peimutatlon matrix, 312 
Perpendicular, to a hyperplane, 153 
to a subspace, 102 
Plane In a vector space, 145 
Planes, crossing, 147 
Intersecting, 147 
intersection of, 147 
parallel, 146 
Polygon closing law, 25 
Power sequence, 363 

Pseudosolutlon (generalized solution), 267 
normal, 267 


Quadratic form, 284 
index of Inertia of, 310 
matrix of, 292 
negative definite, 287 
nonnegativc, 287 
nonpositive, 287 
nonsingular, 293 
of constant signs, 287 
positive definite, 287 
signature of, 310 
strictly of constant signs, 287 


Rayleigh quotient, 385 
Ring, commutative, 30 
noncommutatlve, 30 
Root basis, 234 
Root, of a polynomial, 211 
of an operator, 245 
Root subspace, 230 
Root vector, 230 
height of, 232 


Scalar product of vectors, 88, 90, 106 
Scbur's theorem, 240 
Second-degree curve, 320 
Second-degree hypersurface, 315 
Second-degree surface, 320 
Sequence, convergent, 161 


uenca, fundamental, 162 
nfinltely large, 164 
Set, 11 

bounded. 161 
closed, 162 
convex, 153 
dement of, 11 
finite. 11 

Similarity transformation matrix, 202 
Simple root of a polynomial, 219 
Slniplcctlc space, 355 
Spare, arithmetical. 08 
bilinear metric, 333 
complete, 163 
complex, 38 

complex Euclidean, 335 
Euclidean, 91 
finlte-dlmenslonal, 53 
HormtUan bilinear metric, 333 
infinite-dimensional, 53 
linear, 38 
metric, 160 
normed, 167 
pscudounltary, 355 
rational, 38 
real, 38 
unitary, 106 
Sphere, 161 
closed, 162 
Stationary point, 386 
Sturm sequence, 384 
Subspace, cyclic, 233 
Invariant, 222 
linear, 60 
nonslnguiar, 339 
nontrivial, 60 
null, 340 
proper, 204 
trivial, 60 

Subspaces, direct sum of. 63 
Intersection of, 61 
orthogonal sum of, 97 
sum of, 61 

Summation Index, 41 
Sylvester’s criterion, 311 
System, compatible, 69 
free unknowns of, 71, 159 
eneral solution of, 157 
omogeneous, 157 
Incompatible, 69 
nnnbomogeneous, 157 
normal solution of, 158 
of linear algebraic equations, 69 
partial solution of, 157 
particular solution of, 157 
reduced, 157 
right-hand side of, 69 
solution of, 69 
unknowns of, 69 

Systems of linear algebraic equations, equiv¬ 
alent, 70 


Transitivity, 19 
Translation, 22 
Translation vector, 145 
orthogonal, 146 
Transposition, 124 
Triangle law, 24 
Triple scalar product, 110 


Unit matrix, 190 
Unit operator, 178 
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Vector, 21, 38 
coordinate projection of, 82 
coordinates of, 54, 81 
discrepancy of, 267 

expansion of with respect to a basis, 54 
fixed, 21 
Image of, 177 
Inverse Image of, 177 
Isotropic, 287 
left orthogonal, 334 
length of, 82, 98 
norm of, 169 
normal, 137, 157 
normed, 92 
orthogonal, 88, 94 
orthogonal projection of, 84 
onto a subspace, 101 
onto the hyperplane, 153 
orlhonormal, 88, 94 
projection of, 88 
right orthogonal, 334 
Vector addition, 24 
Vector product, 110 
Vector space, 38 

Veator spaces, isomorphism of, 65 
Vector subtraction, 25 
Vectors, hasls of the system of, 51 
col linear, 23, 03 


Vectors, coplanar, 23 
elementary transformation of a system of, 
52 

equal, 22 

equivalent systems of, 50 

In a general position, 148 

left-handed triple of, 109 

linear combination of, 45 

linear dependence of the system of, 47 

linear Independence of the system of, 47 

oriented volume of a system of, 110, 116 

orthogonal sets of, 96 

pseudoorthogonal system of, 346 

rank of a system of, 52 

right-handed triple of, 109 

span of, 45 

subsystem of, 45 

system of, 45 

volume of a system of, 110, 115 
Vieta's formulas, 220 


Zero divisor, 3l 
Zero matrix, 190 
Zero operator, 178 
Zero subspace, 60 
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